You are currently browsing the monthly archive for December 2007.

Digging Up Bits of the Past: Hands-on With Obsolescence, by Richard Entlich – Cornell University (rge1@cornell.edu), Ellie Buckley – Cornell University (elb34@cornell.edu) in RLG Diginews, http://digitalarchive.oclc.org/da/ViewObject.jsp?objid=0000070519&reqid=4345 accessed 24 Dec 07.

“In fact, the paucity of good exemplars, the exposure of some popular anecdotes as apocryphal, and the use of near-loss scenarios as stand-ins for actual loss have led to something of a backlash, with claims that the urgency called for by digital preservation proponents is excessive. For example, in 2003, technology writer Simson Garfinkel, writing in the MIT Technology Review, ridiculed claims of wide-scale endangerment of digital content in a piece entitled “The Myth of Doomed Data.” Garfinkel cites the heroic rescue of the BBC Domesday videodisc project as evidence, not of the need for more rigorous attention to digital preservation issues, but as proof that when the content is valuable enough, a technological fix will be found. He then offers a simple formula for eliminating future problems—use widely supported file formats and avoid file compression schemes.

More recently, in February 2006, Chris Rusbridge, director of the UK Digital Curation Centre, published a provocative article in Ariadne entitled “Excuse Me… Some Digital Preservation Fallacies?” in which he expressed skepticism that truly obsolete commercial software actually exists and issued a challenge for readers to submit bona fide examples of older consumer-oriented commercial software products where the data files are “completely inaccessible” today.

Neither author claimed that digital preservation is a non-issue, and both acknowledged that certain types of obsolescence (e.g., media formats and non-standard file formats) present more significant problems. But both asserted that the sky may not be falling quite as severely or as imminently as often depicted, particularly for commonly used media and file formats.”

So, we might be ok. Commercial market forces have done the standarisation for us.

PC magazine 1988

This is a cool image – shows 55 word processors being reviewed in the late 1980s. Things have standardised a lot since then. Image from http://digitalarchive.oclc.org/da/ViewObject.jsp?objid=0000070519&reqid=4345

Did some sums. Let’s say we write down 110010100101… etc on a sheet of A4. We could get about 2,400 bits on a page. A 3.5 inch floppy disk contains 1.4Mx8 bits, ie. 11,200,000 bits. So that works out as 4,667 sheets of A4.

If we can get 200 pages into a standard archive box, we would need 23 archive boxes to store one floppy disk.

darp1.jpgThe EEDPRPP report 2006 has an appendix containing results of readability investigations into 12 instances of obsolete formats. The term “obsolete format” seems to refer largely to media obsolescence, eg old DAT tape or punch cards. The team had to involve a specialist data recovery firm, UKDA and TNA too. The results seem pretty positive to me:

  • 6 out of the 12 were successful migrations.
  • 5 were cases where the media could not be read. So this is a media/harware failure, rather than a file format one.
  • Only 1 case was a failure to read the data’s file format, rather than dead media. This case was a 9 track tape reel of environment planning backup data from c 1988. Even here the work was abandoned due to cost/efficiency reasons. “Further analysis could be done on the file in order to try and interpret the contents and structure of the data. Although possible, this would be very time-consuming…”

Emulation theory says we should keep the manuals. But the assumption there is that manuals are enough to get systems working, when in reality manuals are atrocious things, badly written and unintelligible.

http://www.asktog.com/columns/017ManualWriting.html (accessed 24.12.07) has this to say about manuals:

“An amazing number of companies rationalize their way out of supplying a manual at all, then complain as loudly as anyone else about the stupid users calling customer support. A manual, since many people apparently don’t know, is made of ground-up dead tree. Those delightful little PDF files that people insist on including on their CD ROM don’t make it. First, my experience has been that Acrobat only opens around 40% of the time. (The rest of the time either it is distressed because it can’t find some infinitely important font it wants or I’ve already got as many windows open as the OS can handle.) Second, even when it does open, these electronic manuals are not only difficult to read, they are anything but portable… Some folks have found a clever way to drive people to piracy even while supplying a dead-tree manual. We now have the spectacle of major software houses, including Microsoft and Apple, turning out atrocious manuals in the full expectation that users will buy “real” manuals in the bookstore, so the users can actually figure out how to use the program. These manufacturer’s junk manuals typically display the characteristics of an edited feature spec, with no thought as to structure. (Sometimes the features are just listed in alphabetical order!) … A lot of bad manuals out there are actually good feature specs that companies have just translated from engineeringese into a human language, then shipped.”

Helen Forde (see earlier posting) says they were lost. Urban myth, though:

http://astrogeology.usgs.gov/Projects/LunarOrbiterDigitization/: web page of the Lunar Orbiter Digitization Project gives sone techie details. “Five Lunar Orbiter missions were launched in 1966 and 1967 to study the Moon. The first three missions were devoted to mapping potential lunar landing sites. The fourth and fifth missions were intended for broader scientific goals. Lunar Orbiter 4 photographed the near-side and 95% of the far-side of the Moon. Lunar Orbiter 5 completed the photography of the far-side and collected medium- and high- resolution imagery of 36 preselected regions… The full LO (Lunar Orbiter) dataset consists of 967 medium resolution (MR) and 983 high resolution (HR) frames. Due to their large size, HR frames have traditionally been divided into three sections (referred to as sub-frames). Prior to being placed onboard the spacecraft, the photographic film was exposed with strip numbers, a nine-level grayscale bar, resolving power chart, and reseau marks. The original photograph was scanned into a series of strips onboard the spacecraft and then transmitted to Earth as analog data. Photographic prints from these film strips were hand mosaicked into sub-frame (for HR data) and full-frame (for MR data) views and widely distributed.”

This suggests that there were no earth-bound digi images at all. The images were sent from the orbiter by radio, where they were printed onto paper and hand-mosaic’ed. Digitisation was then not done until this project started.

http://www.lpi.usra.edu/resources/lunarorbiter/processing/: the Lunar and Planetary Institute, which has published the images on the web at http://www.lpi.usra.edu/resources/lunarorbiter/. Some images don’t survice from (say) LO 2 of 1966, “Between November 18 and 25 it produced 211 photographs during 40 orbits, although some photographs were lost during transmission to Earth,” but this is a radio transmission failure, not a digi preservation one.All the successfully transmitted images still survive today (and are on the website).

Apparently this is the real thing for records management. Applies to all records, paper or digital. Probably need to find out a bit more about this.

By Stuart D. Lee, 2002. Reviewed by Richard M. Davis in JSA vol 23 no 2, 2002.

Aimed at librarians and information science students, so it deals mainly with electronic format published material within a library context. Recommends using published, open standards for data storage and exchange, to best preserve data beyond the life of the host system. ‘But of course publishers have much the same reservations about giving us those sorts of freedoms as record companies do about us ripping and burning our own CDs!’