You are currently browsing the category archive for the ‘Articles: misc’ category.

Podcast available as a video podcast from the NEWS! section at http://www.liv.ac.uk/lucas/

Duranti points out that digital preservation places some new obligations upon archivists in addition to the ones recognised under paper preservation theory, mainly to do with authenticity. The archivist has to become a “designated trusted custodian,” with input into record decisions at the very beginning of the record lifecycle. Relevant tradtional archivist responsibilities include:

Read the rest of this entry »

Advertisements

News article about this available here at Newsfactor.com.

The Blue Ribbon Task Force on Sustainable Digital Preservation and Access is yet another project looking at how we can store things for “aeons.” Although they have only just begun, it seems likely from the article that they are going to (a) recommend the migration route rather than the emulation one, and (b) suggest the data is stored on a network of scattered digital repositories.

Heidi is an anlyst at the Enterprise Strategy Group and her thoughts on digital archiving in 2008 are available here.

The main points which interest me in Heidi’s article are:

  • too many companies get archiving mixed up with backups. But these are two wholly distinct concepts
  • archiving to tape is too expensive in terms of staff time taken to retrieve an item, while archiving to primary storage also has cost implications in that you are probably making too many unnecessary backups

Her suggested solution is setting up some sort of automated migration. Manual migration (even manual checking of automated migration, presumably?) will be simply unable to cope with the enormous increase of data expected over the next few years.

There is a cool graph in the article showing how much data is expected to exist by 2010 – 27000 petabytes, probably.

Article by Jeffrey Darlington in TNA’s RecordKeeping magazine, Summer 2004. Nearly 4 years old now, of course.

Digital Archive

TNA established a Digital Preservation Dept to preserve the increasing number of born-digital records which UK government departments were creating, and to offer guidance on digipres issues to the wider community. In April 2003 TNA’s Digital Archive was launched. The Digital Archive “uses open standards and technologies wherever possible, including extensive use of Java and XML. The system stores electronic records with their associated preservation metadata.” The DA can store WP docs, emails, websites, sound, video and databases.

Read the rest of this entry »

Backwards compatibility depends on software companies. “Microsoft doesn’t want to support all of the quirks of their legacy formats forever. That just leads to bloated, fragile code, more expensive development and support costs. They would rather have clean, structured markup, like ODF.” (http://www.robweir.com/blog/2007/06/file-format-timeline.html accessed 27.11.2007).

As generations go by, the legacy formats will drop. It’s not certain that modern applications are fully backwards compatible anyway. “In researching this article, I tried to open some of my notes which were written in an old version of Word for Windows. Word 2007 refused to open them for “security” reasons and pointed me on a wild-goose chase of knowledge base articles describing obscure registry settings I would have to set to open old files. It is extremely frustrating how much you have to run in place just to keep where you were before with Microsoft’s products, where every recent release requires hacks, workarounds, and patches just to get to where you were before.” (Joel Spolsky, http://www.joelonsoftware.com/items/2007/04/25.html, accessed 27.11.2007).

Microsoft have dropped VBA for Macintosh versions of MS Office (same blog entry). “Word 2007 can open files created in all previous versions of Word, 1.0 through 2003. Word will open older documents in compatibility mode. You know this because at the top of the document “(Compatibility Mode)” appears next to the name of the file.” (Microsoft’s website: http://office.microsoft.com/training/training.aspx?AssetID=RP100664731033 accessed 28.11.2007)

Deprecated features in Excel 2007: (from http://blogs.msdn.com/excel/archive/2006/08/24/718786.aspx, accessed 28.11.2007)“Before jumping to the list of which features are being deprecated or removed, let me say that we never make decisions lightly about removing functionality that has been in the product.  We rarely remove functionality and strive for backwards compatibility with every version.  When we do make changes to functionality that has been in the product we do so when we believe it will be a benefit to the majority of our customers by helping us to make forward progress… Historically Excel has supported many different data formats.  We have determined that a number of these older formats are seldom, if ever used.  We are removing support for some file types to allow us to devote more of our efforts towards the file formats that are being used.  Theses formats are being deprecated in 2 ways.  For the set of file formats with the lowest usage, we will be discontinuing support for opening and saving of these formats.  For the second set that has some minimal usage, we will support loading the files in Excel 2007 to allow you to save them in a newer format.”

The formats which Excel 2007 cannot even open (let alone save) include WK1 and WK4 (1-2-3 formats) and Microsoft Excel Chart (.xlc). Older versions of Excel, such as Excel 2.0, can be opened but any editing would result in the files being updated to more recent ones. Discussion on the blog suggests that there’s a lot of WK* files out there, none now easily openable.

Brief article about the project in TNA’s RecordKeeping for Autumn 2004.

Original project

The original project was only possible because of a Government programme which had put a BBC Micro into every school in the country by 1980-81, creating a user base of compatible computers. School children in 1986 entered their own data onto their school computers, which was copied onto floppy disks or tapes sent to the BBC. All these text and images, together with analogue photographs of OS maps, were transferred to analogue videotape. The community data finally totalled 29,000 photographs and 27,000 maps. The whole database was then assembled on master videotapes from which the final videodiscs were produced. The monitor was usually a TV, which imposed a limit on the level of detail visible at once: users needed to switch between maps, pictures and text.

Restoration project

There were a number of parallel rescue projects but the one which actually worked was a collaboration between TNA, BBC and others. It did not rescue data from the videodiscs, but from the master tapes.

Independently, LongLife Data Ltd had developed a new PC interface to the community data. It works in the same way as the real one but because a modern monitor has higher resolution than a 1980s TV screen, pictures and text can be shown simultaneously. This is the version now available on the web.

Alans thoughts

  • the data was restored from analogue videotapes, not from the videodiscs or from the submitted floppy disks. After 15 years the tapes were still readable. So in a sense it’s a straightforward media refreshing thing.
  • the new interface is not an exact emulation of the old interface. It is a wholly new app. The current browsing experience has therefore lost authenticity. (Though the data is the same.)
  • can we find out anything about the authenticity of the data itself?

Emulation theory says we should keep the manuals. But the assumption there is that manuals are enough to get systems working, when in reality manuals are atrocious things, badly written and unintelligible.

http://www.asktog.com/columns/017ManualWriting.html (accessed 24.12.07) has this to say about manuals:

“An amazing number of companies rationalize their way out of supplying a manual at all, then complain as loudly as anyone else about the stupid users calling customer support. A manual, since many people apparently don’t know, is made of ground-up dead tree. Those delightful little PDF files that people insist on including on their CD ROM don’t make it. First, my experience has been that Acrobat only opens around 40% of the time. (The rest of the time either it is distressed because it can’t find some infinitely important font it wants or I’ve already got as many windows open as the OS can handle.) Second, even when it does open, these electronic manuals are not only difficult to read, they are anything but portable… Some folks have found a clever way to drive people to piracy even while supplying a dead-tree manual. We now have the spectacle of major software houses, including Microsoft and Apple, turning out atrocious manuals in the full expectation that users will buy “real” manuals in the bookstore, so the users can actually figure out how to use the program. These manufacturer’s junk manuals typically display the characteristics of an edited feature spec, with no thought as to structure. (Sometimes the features are just listed in alphabetical order!) … A lot of bad manuals out there are actually good feature specs that companies have just translated from engineeringese into a human language, then shipped.”

Chris Rushbridge in Ariadne, February 2006

http://www.ariadne.ac.uk/issue46/rusbridge/intro.html, accessed 19 Dec 07.

Rushbridge thinks that the digital preservation case has been over-argued, which has led to a backlash, and has also been counterproductive in that it makes digital preservation look far more expensive than it actually is; so no one then pays for it.

File format change: Rushbridge challenges people to actually think of an old commercial file format which is genuinely unreadable today, rather than simply “obsolete”, which tends to be a euphemism for “difficult to retrieve.” Rushbridge defines unreadable as ‘total loss of information content,’ rather than just a partial loss. As far as I know, no one’s met his challenge. (File formats created for specific problems, or for devices like cameras, do indeed get unreadable quickly.) There is a perception that files are unreadable but that is just a perception. File formats have actually stabilised over the years, as the infomation revolution sorts itself out.

Migration: rather than every 3-5 years, this might only need to be done every 10-15 years. So it’s cheaper than we initially thought.

Fidelity: because there is no way of knowing what the future designated communities will actually be interested in, there is pressure to keep all aspects of a record, just in case. This is very expensive, so it leads to less funding, and fewer things preserved. So Rushbridge is in favour of dessicated formats, limiting the documents just to reduced sets of significant properties, but which are much easier to preserve. But keep the original bitstream as well. So, anyone who is just after data can see the desiccated format, and be happy, while the scholars after more exact properties can put the effort in to recapture the full functionality (and they are the ones who pay for it). It means you still have to keep good documentation and metadata. AA: seems a bit like our CALS policy, though I need to add a bit about keeping the original bitstream.

Costs: digital preservation is cheaper than paper [this is like my Domesday book example]. All preservation is expensive, but digital only seems expensive because it is new, and is not yet costed into anything. Paper archives and libraries are costed, and we have grown used to the costs. The biggest single problem in digital preservation is money, and that’s partly because it is short-term project funded. Also, we need to spend the money wisely. If we think in terms of 1000 years, then we end up paying loads on a handful of documents, so we lose more. Perhaps we should just think about the next generation, instead.

Rushbridge’s thought rather than mine, but very realistic:

“It seems to me that it makes more sense for most of us to view digital preservation as a series of holding positions, or perhaps as a relay. Make your dispositions on the basis of the timescale you can foresee and for which you have funding. Preserve your objects to the best of your ability, and hand them on to your successor in good order at the end of your lap of the relay. In good order here means that the digital objects are intact, and that you have sufficient metadata and documentation to be able to demonstrate authenticity, provenance, and to give future users a good chance to access or use those digital objects.”