Heidi is an anlyst at the Enterprise Strategy Group and her thoughts on digital archiving in 2008 are available here.

The main points which interest me in Heidi’s article are:

  • too many companies get archiving mixed up with backups. But these are two wholly distinct concepts
  • archiving to tape is too expensive in terms of staff time taken to retrieve an item, while archiving to primary storage also has cost implications in that you are probably making too many unnecessary backups

Her suggested solution is setting up some sort of automated migration. Manual migration (even manual checking of automated migration, presumably?) will be simply unable to cope with the enormous increase of data expected over the next few years.

There is a cool graph in the article showing how much data is expected to exist by 2010 – 27000 petabytes, probably.

deegantanner.jpg Digital Preservation (Digital Futures Series) (Hardcover), by Marilyn Deegan (Editor), Simon Tanner (Editor). Hardcover: 260 pages; Publisher: Facet Publishing (18 Sep 2006); ISBN-10: 1856044858. Available at Amazon.

This is the most recent book published in the UK on digital preservation, and if I can speak from a parochial viewpoint for a bit, it’s nice to have a UK slant on things, with details given about UK projects. This means that Digital Preservation contains some practical information which is not present in Borghoff et al.

I’m not aware of any costings which have actually been done, but my gut feeling is that the balance digital vs. paper comes out in favour of digital. (By “paper” I’m including parchment, photographs etc too.)

1. The biggest single ongoing cost in any repository is staffing. A paper-based archives service has to run searchrooms for users to consult the materials, where users are supervised and security is ensured. So, paper-based repositories have to employ receptionists, searchroom assistants, relief staff to cover when other staff are away etc. A digital repository which makes its assets available over the web does not incur any of these costs.

There is the issue of authenticity. The individual printing out the record often has a certain level of control over how that document is printed: fields or text can be removed from the printed version even if they remain in the digital original. Printing from spreadsheets usually results in the paper copy having only values and calculated data, not the formulas, or comments. This means that a paper document cannot necessarily be trusted as a full and complete equivalent of a digital record. Yet many people will allow the digital original to be deleted, or get lost, after the paper copy has been created. This may not be an issue for your home computer, but it may well be an issue in an organisation where different members of staff are printing different things.

How do you access paper? – need a supervised searchroom, really, with all the costs that entails. And BS5454 storage. Digital preservation is actually cheaper than paper, if properly handled.

Digital records have a feature not present in paper ones, namely behaviour. A paper document is a fixed item, but digital documents are sometimes interactive, and for some of these the behaviour is an essential part of the meaning. Spreadsheets are a good example.

Also, for some organisations there is a legal aspect. If the original document is digital, then it has to be preserved digitally.

Chris Rushbridge in Ariadne, February 2006, accessed 19 Dec 07.

Rushbridge thinks that the digital preservation case has been over-argued, which has led to a backlash, and has also been counterproductive in that it makes digital preservation look far more expensive than it actually is; so no one then pays for it.

File format change: Rushbridge challenges people to actually think of an old commercial file format which is genuinely unreadable today, rather than simply “obsolete”, which tends to be a euphemism for “difficult to retrieve.” Rushbridge defines unreadable as ‘total loss of information content,’ rather than just a partial loss. As far as I know, no one’s met his challenge. (File formats created for specific problems, or for devices like cameras, do indeed get unreadable quickly.) There is a perception that files are unreadable but that is just a perception. File formats have actually stabilised over the years, as the infomation revolution sorts itself out.

Migration: rather than every 3-5 years, this might only need to be done every 10-15 years. So it’s cheaper than we initially thought.

Fidelity: because there is no way of knowing what the future designated communities will actually be interested in, there is pressure to keep all aspects of a record, just in case. This is very expensive, so it leads to less funding, and fewer things preserved. So Rushbridge is in favour of dessicated formats, limiting the documents just to reduced sets of significant properties, but which are much easier to preserve. But keep the original bitstream as well. So, anyone who is just after data can see the desiccated format, and be happy, while the scholars after more exact properties can put the effort in to recapture the full functionality (and they are the ones who pay for it). It means you still have to keep good documentation and metadata. AA: seems a bit like our CALS policy, though I need to add a bit about keeping the original bitstream.

Costs: digital preservation is cheaper than paper [this is like my Domesday book example]. All preservation is expensive, but digital only seems expensive because it is new, and is not yet costed into anything. Paper archives and libraries are costed, and we have grown used to the costs. The biggest single problem in digital preservation is money, and that’s partly because it is short-term project funded. Also, we need to spend the money wisely. If we think in terms of 1000 years, then we end up paying loads on a handful of documents, so we lose more. Perhaps we should just think about the next generation, instead.