You are currently browsing the tag archive for the ‘Projects’ tag.

RODA

RODA

RODA (Repository of Authentic Digital Objects) is a Portuguese initiative to preserve government authentic digital objects. It is based on Fedora Commons, and supports the preservation of text documents, raster images, relational databases, video and audio. It runs in Java on a suitable browser. RODA’s core preservation stragey is migration, but it keeps the original representation too, so it should be OK to open old files on emulated systems.

It’s in its final stages of preparation now; a demo is available at http://roda.di.uminho.pt/?locale=en#home. I’ve created the screengrabs below myself while exploring the demo.

My notes are very brief. If you go along to the demo you will discover that two of the PDF documents preseved there are papers explaining more about the principles, systems and strategy behind the RODA project.

RODA is OAIS-compliant, so let’s run through this in OAIS order.

The SIP: this comprises the digital original and its metadata, all inside a METS envelope which is then zipped. Preservation metadata is a PREMIS record and descriptive metadata is in a segment of EAD. Technical metadata would also be nice but RODA’s creators say it “is not mandatory as is seldom created by producers.”

Files included in the SIP are accompanied by checksums and are checked for viruses.  Neatly, there are a number of ways that producers can create SIPs, one of which is a dedicated app called RODA-in.

Ingest. The system logs all SIPs which are in progress

Ingest. The system logs all SIPs which are in progress

Files in non-approved preservation formats (eg JPGs) are then normalised into formats which are approved (eg TIFFs). At that point they become AIPs. Approved formats are PDF/A for text (and for powerpoint presentations too, to judge from the examples on the demo), TIFF for images, MPEG-2 for video, WAV for audio, and DBML, this last one being an XML schema devised by the RODA team themselves for databases. Files in other formats are normalised by going through a normalisation plugin; “plugins can easily be created to allow ingestion of other formats not in the list.”

The AIP: if the archivist approves the SIP, and if it contains a normalised representation, then it becomes an AIP, and the customer can either search for it (simple search or a smart-looking advanced search) or browse the classification tree. The customer can view descriptive metadata, preservation metadata, previews of the data (depending on what sort of data it is) and the data itself.

Preservation metadata can be viewed as a timeline

Preservation metadata can be viewed as a timeline

An AIP. This is for a series of images; text documents, sound files etc all look different

An AIP. This is for a series of images; text documents, sound files etc all look different

Previews of specific images in the AIP

Previews of specific images in the AIP

The photo and book-style previews are beautiful. I never knew Portugal looked like this 🙂

Security: currently the demo is open, but when it’s finally in action all users will be authenticated prior to accessing the repository, and all user actions will be logged. No anonymous users will be allowed. All preservation actions, such as format conversions, are likewise recorded. Permissions can be fine-tuned so that they apply from repository level all the way down to individual data objects. If a user does not have permission to view a specific item then it will not show in their search results.

And at the end of it all, the system can create stats!

The Administrator account can see stats

The Administrator account can see stats

One thing which immediately strikes is the clean finish to its user interface, the RODA WUI layer (RODA Web User Interface). Very, very cool.

The Portuguese team has clearly put in a great deal of time and skill here.  The project team is comprised of the Portuguese National Archives who carried out archiving consulting and development, the University of Minho which did the software engineering consulting, Assymetric Studios with design, the IDW with hardware, and Keep Solutions with maintenance and support.

My thanks to Miguel Ferreira of the University of Minho for answering my questions about RODA.

Advertisements

Yesterday I visited Gloucestershire Archives to have a look at their GAIP (Gloucestershire Archives Ingest Package) software.

GAIP is a little Perl app which is open source and nicely platform independent (yesterday we saw it in action on both XP and Fedora). Using GAIP, you can take a digital file, or a collection of files, and create a non-proprietary preservation version of it, which is then kept in a .tgz file containing the preservation version, the original, the metadata, and a log of alterations. Currently it works with image files, so that GAIP can create a .tgz containing the original bmp (for instance) as well as the png which it has created. GAIP can then also create a publication version of the image, usually a JPEG. Gloucestershire Archives are intending to expand GAIP to cover other sorts of files too: it depends on what sorts of converters they can track down.

At present GAIP uses a command line interface which isn’t terribly friendly, but this can easily be improved.

From my point of view, I was glad to have a play with GAIP as it has rekindled my optimism about low-level digital preservation. I have been in a sulk for a couple of months because the only likely solutions seemed to be big-budget applications set up by (and therefore controlled by) national-level organisations. GAIP however is a ray of local light, a sign that UK local authorities might be able to develop in-house and low budget solutions which are realistic to our own specific contexts.


MLA East of England has published the report on Phase 2 of its Digital Preservation Regional Pilot Project (DARP 2). The report is available as a PDF here. Phase 2 was carried out by Bedfordshire County Council over the period September 2007-June 2008.

The project is of great use to UK local authority record offices, such as the one I work for, because it assesses the real world situation where outside organisations create digital records and then deposit them with local archive services. This is a different situation from that experienced by national archive organisations, which by and large deal with fewer record-creating organisations, and which therefore have more say over the sorts of records created. A UK local authority archives service typically deals with thousands of separate organisations and individuals, and has little or no say over file formats.

The aim of the DARP 2 project was therefore to survey a sample of these “typical depositors” to establish the reality behind this concern. Are organisations creating large numbers of electronic records for long term preservation, or are they still reliant on paper? How are they using digital records?

Bedfordshire and Luton Archives Service surveyed a range of organisations, including Parochial Church Councils, magistrates courts, town councils, parish councils, state and independent schools, and some businesses and charities. The survey was carried out with a questionnaire and with a follow-up interview.

Summary of DARP 2’s interesting results

“The overall picture was one of all or nothing in terms of understanding.” This is probably just as true of colleagues within the archives sector… Digital preservation sadly is not a subject which people can pick up a working knowledge of in their day to day activities, nor does it crop up very often in the media. You are either interested in it (in which case you will read up lots) or you are not (in which case you will know nothing). It is not like (say) gardening, where there is a whole spectrum of levels of involvement, from just weeding right through to plant breeding.

Most organisations still use paper. Some bodies stated that this was due to issues concerning the admissibility of digital records in court. Other organisations depend entirely on volunteers using home computers, using unsophisticated filing systems on old equipment. At least one organisation stated that electronic records were kept purely as backup for paper. Certainly it seems that many organisations regard paper as the best long term solution: more than half of all respondents archived their emails by printing them out and filing the hard copies.

The report itself states that “paper is still the medium of choice for record keeping – 85% of the bodies surveyed are printing out digital files… although computers offer very creative means of generating ways of populating and decorating the blank page, they are tending to be seen as tools for manipulating and storing documents not as the final means of storing and managing records.”

Only ten replies (out of 26) responded to the question concerning migration, and three respondents even stated that they did not understand the concept.

There is a problem with digital record keeping in state schools. It is remarkable that DARP found it difficult even to state schools in the project, or even to work out who at a school was responsible for record keeping.

No organisation thought that the record office would fail to deal with digital records.

The most popular backup medium was CD-R, closely followed by memory stick.