You are currently browsing the monthly archive for January 2008.

From Listserve Jan 07 ppp1.jpg

“DigitalPreservationEurope is pleased to announce the release of the second in a series of thought provoking and controversial position papers on a range of issues surrounding digital preservation, ‘So Where is the Black Hole in our Collective Memory?’. It is our intention that these papers will promote vigorous debate within the digital preservation community and encourage people to think about digital preservation in new and innovative ways by exploring and challenging the received wisdom.

Harvey’s position paper asks important questions: Have the digital preservation community cried wolf too often? Are our strident, alarmist proclamations about the loss of digital materials too extreme? He argues that our inability to bring evidence to bear in support of such claims leave us exposed and easily overlooked.

You can comment on this paper and the issues it raises by joining the debate in the DPE forum by visiting here.

You can also access the position paper by visiting here.”

Alan’s thoughts

The paper comes out with the standard revisionist line, ie that examples of data loss are in fact examples of near data loss, or indeed data recovery. Useful to have a summary of the Usual Suspects: Viking lander (data recovered), BBC Domesday (data recovered), first email [AA who cares?], first website [AA ditto], 1960 US census data (data recovered). I have my own experience of this with FIF images.

The paper however does not mention that these data archaeology projects were expensive: good digital preservation policies would have prevented the data from becoming endangered in the first place. Moreover, these were all successful data projects. I wonder if there are examples out of there where the data archaeology was left too long?

oais1.jpgNoted from the OAIS model.

In response to a request from a Consumer, the OAIS provides all or part of an AIP, or many AIPs, in the form of a DIP. The DIP doesn’t have to have complete PDI. DIPs are supplied by the Access entity within an OAIS, and can be supplied either on- or off-line.

“The Consumer uses an OAIS supplied Ordering Aid to develop an order request to acquire the data. The Consumer produces a logical view of the desired AIPs and associated Package Descriptions to be included in the Dissemination Information package and specifies the physical details of the Data Dissemination session such as media type and object format. This process may involve no visible interaction if adequate defaults exist. This order can also specify any transformations the Consumer wishes applied to the AIPs in creating the DIP” (4.3.4).

oais1.jpgNoted from the OAIS model.

SIPs get transformed into one of these for preservation. The AIP “is defined to provide a concise way of referring to a set of information that has, in principle, all the qualities needed for permanent, or indefinite, Long Term Preservation of a designated Information Object …. though the implementation of the AIP may vary from archive to archive, the specification of the AIP as a container that contains all the needed information to allow Long Term Preservation and access to archive holdings remains valid” (4.2.2.2 and 3).

The AIP has a complete set of PDI for the associated Content information. The Packaging Information of the AIP will conform to OAIS internal data formatting and documentation standards, and may vary over time as the OAIS changes its practices. Transforming a SIP into an AIP “may involve file format conversions, data representation conversions or reorganisation of the content information” (4.1.1.2).

AIPs are managed within the OAIS by the Archival storage entity (4.1). Functions include managing the storage, refreshing the media, performing routine and special error checking, and providing disaster recovery capabilities: see 4.1.1.3 for details of all these.

Some AIPs may only exist as the output of algorithms operating on other AIPs (3.2.6).

Subtypes

Section 4.2.2.4 of the model refers to two AIP subtypes. The Archival Information Unit is the “atom” which the archive is asked to store. A single AIU contains exactly one Content Information object (which in turn may be multiple files, however) and exactly one set of PDI (4.2.2.5). The example they give is a digital movie. This AIU would contain three objects:

  • the digital encoding of the movie in a proprietary format
  • the Representation Information needed to understand this format (these two form the Content Info)
  • PDI: date of creation, featured actors, movie studio, etc, and a checksum for integrity.

The second subtype is the Archival Information Collection. There might be millions of AIUs, you see, so the answer is to aggregate them into AICs using criteria determined by the archivist (4.2.2.7). A single AIP can belong to multiple AICs. The AIC itself is a complete AIP which contains PDI. The PDI provides further info such as when and why it was created, context to related AICs, desired levels of security etc.

Borghoff et al point out that OAIS does not allow for changes in stored AIPs. Instead, the AIP must be extracted from the archive as a DIP, modified, and then resubmitted as a SIP. “We hope that for trivial changes the archiving systems will provide more pragmatic and simpler solutions” (p. 52).

An excellent new blog.

“This blog is a place for ULCC’s Digital Archives staff to record information about the activities and projects they are involved with. ULCC’s Digital Archives department has been working for over a decade on digital archives, library and preservation projects and initiatives, including systems for the University of London, the National Archives, the British Library and the JISC. We hope that the blog will build an authentic journal of our work and a reliable reference and online memory for our own records – a less formal record than reports and newsletters. If any of the information in it is helpful to others working in the field, so much the better.”

Follow the link in the blogroll

From Listserve, 17 Jan 07. Personal names and emails removed.

Digital Preservation Europe is delighted to announce its second international Digital Preservation Challenge.

About the Digital Preservation Challenge

DigitalPreservationEurope (DPE) raises awareness and improves practice in the management, longevity, and reuse of digital assets. To this end, DPE is delighted to announce the second international Digital Preservation Challenge. The Digital Preservation Challenge aims to promote innovation at all levels and will provide an insight into the range of digital preservation risks currently being faced by international research and practitioner communities.

The challenge invites participants to overcome the barriers hindering access to five digital objects. Each set of objects is accompanied by a highly abstracted scenario based on real-life situations. These scenarios are intended to make the challenge more accessible to participants from all backgrounds while not trivialising the serious nature of the digital preservation challenges facing society.

The first DPE Digital Preservation Challenge ran from 25 May to 15 July 2007, Miguel Ferreira of the University of Minho, Portugal, who was awarded the first prize commented:

“The problems proposed in the challenge made me realize how diverse preservation scenarios can be and how difficult it is to find good sources of information, tools and services for carrying out preservation interventions. The challenge also made me realize how specific and time-
consuming preservation interventions can be and how difficult is to find good and general solutions applicable to all sorts of preservation contexts.”

Evaluating submissions

Submissions to the second Digital Preservation Challenge will be assessed by a panel of international digital preservation experts and practitioners. The incremental scoring method the panel will use emphasises the thoroughness and quality of the documentation of the processes used to solve a challenge task rather than the overall outcome itself. In this respect, solutions to single tasks or sections of the challenge will also be considered and it may be possible for an individual to win the challenge even if he/she cannot ultimately render all the objects. Winning submissions to the Digital Preservation Challenge will be published on the DPE website following the announcement of the winning entries.

Important dates

Opening of the Challenge: 15 January 2008
Deadline for submissions: 30 May 2008 at 4pm GMT
Announcement of winners: ECDL 17 September 2008, Aarhus, Denmark

Awards

First Prize 3000 Euros
Second Prize 1500 Euros
Third Prize 500 Euros

To learn more about or to take part in the Digital Preservation Challenge, please visit here.

Questions or comments should be sent here.
challenge@digitalpreservationeurope.eu

From Listserve, 17 Jan 07

In December 2006 the Koninklijke Bibliotheek (KB), National Library of the Netherlands, commissioned the RAND Corporation/RAND Europe to analyse KB’s e-Depot strategy, following up on the positive assessment and advise of an international Evaluation Committee in 2005.

The Technical Report “Addressing the uncertain future of preserving the past. Towards a robust strategy for digital archiving and preservation” was presented on 2 November 2007, during our International Conference on Digital Preservation Tools and Trends. The Koninklijke Bibliotheek has now released a Response to the recommendations of the RAND Report. The KB aspires to implement its reponsibility towards researchers, research institutions, research libraries and publishers by consolidating its position in the international vanguard of digital preservation.

The whole text of the Report can be accessed on our website here.

The Response of the Koninklijke Bibliotheek can be accessed here.

oais1.jpgNoted from the OAIS model.

SIPs are sent to the OAIS archive by Producers. Producers are authors, organisations or even programs which deliver documents to the OAIS. Some submissions will have insufficient Representation Information or Preservation Description Information to meet stringent AIP requirements, which is why they cannot necessarily be AIPs.

The form of the SIP will typically be negotiated between the Producer and the OAIS (2.2.3). Most SIPs will have some Content Information and some PDI, but it may require several submissions to form an AIP. If there are multiple SIPs which use the same Representation Information it is likely that this RI will only be provided once to the OAIS (4.2.2.2).

Ideally there should be a submission agreement between the Producer and the OAIS, specifying criteria like file formats, subject matter, ingest schedule, access restrictions, verification protocols, etc (2.3.2). “Considerable iteration may be required to agree on the right information to be submitted, and to get it into forms acceptable to the OAIS” (3.2.1). You also need to negotiate legal aspects, such as authority to migrate the Content Information to new representation forms (3.2.2). Data submission formats, procedures and deliverables must be documented in the OAIS’s data submission policies (4.1.1.5).

The Ingest entity (4.1.1.2) in an OAIS accepts SIPs, performs QA on them, and generates an AIP. QA might involve checksums or cyclic redundancy checks. If there are errors in the SIP submission then Ingest will request a resubmission. Ingest then transforms the SIPs into AIPs, which might include file format conversion, reorganisation, transfer to different media etc. “An OAIS is not always required to retain the information submitted to it in precisely the same format as in the SIP” (4.3.2). At the very least it will add a unique identifier.

Section 4.3.2 has some examples of SIP to AIP data transformations, such as one-to-one, one -to-many or many-to-one.

oais1.jpgNoted from the OAIS model.

The Information Package is the central entity within an OAIS archive. It comprises:

  • the Content Information ie. the actual Data Object which the archive is trying to preserve, plus its accompanying Representation Information
  • the Preservation Description Information, ie. all the info needed to preserve the CI, together with any Representation Information which the PDI itself needs to be understood.

The PDI is likely to describe

  • provenance: custody, history, processing history
  • context: why the CI was produced, how it relates to other CI objects
  • reference code or ISBN
  • fixity: a checksum or similar.

At its own discretion an OAIS can include Packaging Information about the Information Package. (Really!) This Packaging Information would be info like any file structure or directory structure which the data have.

In addition, there is the separately-stored metadata, the Descriptive Information. This is what allows the Information package to be found in the OAIS. It might just be the title of the IP, or a full set of searchable attributes.

It is important to distinguish between an Information package that is preserved by an OAIS and the Information packages which are submitted to and distributed by an OAIS.

From Listserve 16 Jan 07:

The Digital Curation Centre was delighted to present the Draft DCC Curation Lifecycle Model at the 3rd International Digital Curation Conference 2007. 
 
The model provides a generic graphical high-level overview of the stages required for successful curation and preservation of digital material from initial conceptualisation. The Digital Curation Centre will shortly start to use this draft model to ensure that information, services and advisory material cover all areas of the lifecycle.
 
The development of the model is now open to public consultation. We would be grateful for any comments or thoughts before 29 February 2008. We are also hoping to develop domain specific variations of the model, so ideas and comments relating to particular domains would also be welcome. The model can be found here.

Fuller information can also be found in the current edition of the International Journal of Digital Curation – Issue 2, Volume 2, 2007, here.
 
Comments can be posted on the DCC Forum under “Draft DCC Curation Lifecycle Model”.

American legal firms are now using online storage for their digital preservation:

http://www.internetnews.com/storage/article.php/3721191

But is this such a good idea? The impression I get is that the files are kept in their original file formats. If the file format turns out to be unreadable 10 year later, that is not the online storage company’s problem, but the legal firms’ problem?