You are currently browsing the tag archive for the ‘databases’ tag.
RODA (Repository of Authentic Digital Objects) is a Portuguese initiative to preserve government authentic digital objects. It is based on Fedora Commons, and supports the preservation of text documents, raster images, relational databases, video and audio. It runs in Java on a suitable browser. RODA’s core preservation stragey is migration, but it keeps the original representation too, so it should be OK to open old files on emulated systems.
It’s in its final stages of preparation now; a demo is available at http://roda.di.uminho.pt/?locale=en#home. I’ve created the screengrabs below myself while exploring the demo.
My notes are very brief. If you go along to the demo you will discover that two of the PDF documents preseved there are papers explaining more about the principles, systems and strategy behind the RODA project.
RODA is OAIS-compliant, so let’s run through this in OAIS order.
The SIP: this comprises the digital original and its metadata, all inside a METS envelope which is then zipped. Preservation metadata is a PREMIS record and descriptive metadata is in a segment of EAD. Technical metadata would also be nice but RODA’s creators say it “is not mandatory as is seldom created by producers.”
Files included in the SIP are accompanied by checksums and are checked for viruses. Neatly, there are a number of ways that producers can create SIPs, one of which is a dedicated app called RODA-in.
Files in non-approved preservation formats (eg JPGs) are then normalised into formats which are approved (eg TIFFs). At that point they become AIPs. Approved formats are PDF/A for text (and for powerpoint presentations too, to judge from the examples on the demo), TIFF for images, MPEG-2 for video, WAV for audio, and DBML, this last one being an XML schema devised by the RODA team themselves for databases. Files in other formats are normalised by going through a normalisation plugin; “plugins can easily be created to allow ingestion of other formats not in the list.”
The AIP: if the archivist approves the SIP, and if it contains a normalised representation, then it becomes an AIP, and the customer can either search for it (simple search or a smart-looking advanced search) or browse the classification tree. The customer can view descriptive metadata, preservation metadata, previews of the data (depending on what sort of data it is) and the data itself.
The photo and book-style previews are beautiful. I never knew Portugal looked like this
Security: currently the demo is open, but when it’s finally in action all users will be authenticated prior to accessing the repository, and all user actions will be logged. No anonymous users will be allowed. All preservation actions, such as format conversions, are likewise recorded. Permissions can be fine-tuned so that they apply from repository level all the way down to individual data objects. If a user does not have permission to view a specific item then it will not show in their search results.
And at the end of it all, the system can create stats!
One thing which immediately strikes is the clean finish to its user interface, the RODA WUI layer (RODA Web User Interface). Very, very cool.
The Portuguese team has clearly put in a great deal of time and skill here. The project team is comprised of the Portuguese National Archives who carried out archiving consulting and development, the University of Minho which did the software engineering consulting, Assymetric Studios with design, the IDW with hardware, and Keep Solutions with maintenance and support.
My thanks to Miguel Ferreira of the University of Minho for answering my questions about RODA.
Planning and Implementing Electronic Records Management: a practical guide (Hardcover) by Kelvin Smith (Author), Publisher: Facet Publishing (Oct 2007), ISBN-10: 185604615X. Available from Amazon. Chapter 8 concerns Preservation, especially ‘long-term’, which is defined (p.130) as being ‘greater than one generation of technology.’ Unlike other books I have read so far, Smith’s approach is largely standards-based.
Smith begins by making the interesting point that there is still “a certain amount of distrust” of electronic records (p.129), and that people still seem to be happier with paper for preservation. This is no longer acceptable.
Smith then looks at four core challenges (authenticity, reliability, integrity and useability) in the light of ISO 15489. Authenticity is not an either/or thing: there is a sliding scale of authenticity, and the higher of number of requirements which have been met, the stronger the presumption of authenticity. Likewise, integrity does not mean that a record is unchanged: it means that only authorized and appropriate changes have been made.
Other standards relevant to digital preservation are
ISO 17799 Information security management (a revision of BS 7799)
BIP 0008 Code of practice for legal admissability etc of electronic information
e-GIF the UK e-Government Interoperability Framework
OAIS Open Archival Information System
BS 4783 Recommended environmental storage for digital media
BS 25999 Business continuity best practice
Smith says there is a case for creating the records properly in a sustainable format to begin with. [See I have a cup of coffee. AA] It’s more cost-effective for an organisation to take preservation factors into account at the beginning of the life cycle than halfway along. TNA have guidance on selecting good file formats, and e-GIF is useful here too.
But if you decide to create records in a short term or proprietary format then you need to mull over migration vs. emulation. Smith summarises the usual pros and cons. The only interesting additional points he makes are that (a) migration should always support business needs as well as preserve record content, ie. you don’t want to migrate to a format you cannot directly search or copy from, and (b) any migration strategy should integrate with existing corporate policies and procedures (especially BIP 0008). His RM policies mindset is coming through clearly here.
Smith’s book is the only one I have read so far to include a section on database preservation, and it’s short (less than a page). Preservation depends really on what sort of database it is: in some DBs old data is overwritten by new data, while in others data is never removed or overwritten. Similarly, some DBs are time or project-limited (such as surveys) while others carry on indefinitely. The usual approach is a simple all-or-nothing snapshot of the data which is then converted to some standard form rather than its native one. In addition some systems preserve an audit trail alongside, capturing every alteration made to records.
Implementing the preservation strategy
Smith then finishes the chapter with an excellent three page summary of the key steps you need to undertake, practically, to implement a strategy. A 6-point summary of his 11-point summary:
work with records creators and archivists to appraise and select records for permanent preservation
identify the right people within your own organisation to carry out preservation
decide on a technical preservation approach, and work with ICT people to see that it is carried out and properly tested
verify that the approach has worked ok. And keep a temporary backup of everything until you know it has worked
keep metadata and documentation on everything
keep all the stakeholders in the loop.
He also recommends getting authority to destroy the original e-records when the preservation has been carried out successfully, ie that the records are usable, authentic and reliable.