oais1.jpgNoted from OAIS.

Section 5 of the OAIS model explicitly addresses practical approaches to preserve digital information. The model immediately ties its colours to the migration mast. “No matter how well an OAIS manages its current holdings, it will eventually need to migrate much of its holdings to different media and/or to a different hardware or software environment to keep them accessible” (5.1). Emulation is mentioned later on, but always with some sort of proviso or concern attached.

Migration 

OAIS identifies three main motivators behind migration (5.1.1). These are:

  • keeping the repository cost-effective by taking advantage of new technologies and storage capabilities
  • staying relevant to changing consumer expectations
  • simple media decay.

OAIS then models four primary digital migration types (5.1.3). In order of increasing risk of information loss, they are:

  • refreshment of the bitstream from old media to newer media of the exact same type, in such a way that no metadata needs to be updated. Example: copying from one CD to a replacement CD.
  • replication of the bitstream from old media to newer media, and for which some metadata does need updating. The only metadata change would be the link between the AIP’s own unique ID and the location on the storage of the AIP itself (the “Archival Storage mapping infrastructure”). Example: moving a file from one directory on the storage to another directory.       
  • repackaging the bitstream in some new form, requring a change to the Packaging Information. Example: moving files off a CD to new media of different type.
  • transformation of the Content Information or PDI, while attempting to preserve the full information content. This last one is the one we traditionally term “migration,” and is the one which poses the most risk of information loss.

In practice there might be mixtures of all these. Transformation is the biggie, and section 5.1.3.4 goes into it in some detail. Within transformation you can get reversible transformation, such as replacing ASCII codes with UNICODE codes, or using a lossless compression algorithm; and non-reversible transformation, where the two representations are not semantically equivalent. Whether NRT has preserved enough information content may be difficult to establish.

Because the Content Information has changed in a transformation, the new AIP qualifies as a new version of the previous AIP. The PDI should be updated to identify the source AIP and its version, and to describe what was done and why (5.1.4). The first version of the AIP is referred to as the original AIP and can be retained for verification of information preservation.

The OAIS Model also looks at the possibility of improving or upgrading the AIP over time. Strictly speaking, this isn’t a transformation, but is instead creating a new Edition of an AIP, with all its own associated metadata. This can be viewed as a replacement for a previous edition, but it may be useful to retain the previous edition anyway.

There’s also a Derived AIP, which could be a handy extraction of information aggregated from multiple AIPs. But this does not replace the earlier AIPs.

Emulation 

All that is fine for pure data. But what if the look and feel needs preserving too?

The easy thing to do in the short to medium term is simply to pay techies to port the original software to the new environment. But OAIS points out that there are hidden problems. It may not be obvious when the app runs that it is functioning incorrectly. Testing all possible output values is unlikely to be cost effective for any particular OAIS. Commercial bridges, which are commercially provided conversion SW packages transforming data to other forms with similar look and feel, suffer from the same problems, and in addition give rise to potential copyright issues.

“If source code or commercial bridges are not available and there is an absolute requirement for the OAIS to preserve the Access look and feel, the OAIS would have to experiment with “emulation” [sic] technology” (5.2.2).

Emulation of apps has even more problems than porting. If the output isn’t visible data but is something like (eg) sound, then it becomes nearly impossible to know whether the current output is exactly the same as the sound made 20 years ago on a different combination of app and environment. We would need to also record the sound in some other (non-digital!) form, to use as validation information.

A different approach would be to emulate the hardware instead. But the OAIS model has an excellent paragraph summarising the problems here, too, which I’ll quote in full (in 5.2.2.2):

 “One advantage of hardware emulation is the claim that once a hardware platform is emulated successfully all operating systems and applications that ran on the original platform can be run without modification on the new platform. However, this does not take into account dependencies on input/output devices. Emulation has been used successfully when a very popular operating system is to be run on a hardware system for which it was not designed, such as running a version of Windows™ on an Apple™ machine. However even in this case, when strong market forces encourage this approach, not all applications will necessarily run correctly or perform adequately under the emulated environment. For example, it may not be possible to fully simulate all of the old hardware dependencies and timings, because of the constraints of the new hardware environment. Further, when the application presents information to a human interface, determining that some new device is still presenting the information correctly is problematical and suggests the need to have made a separate recording of the information presentation to use for validation. Once emulation has been adopted, the resulting system is particularly vulnerable to previously unknown software errors that may seriously jeopardize continued information access. Given these constraints, the technical and economic hurdles to hardware emulation appear substantial.” 

Top stuff.

Advertisements