METS (the Metadata Encoding and Transmission Standard) is an XML Schema for encoding descriptive, administrative, and structural metadata regarding objects within a digital library. The METS format was designed to allow the sharing of information management tools and services, and to facilitate the interoperable exchange of digital materials among institutions. The Schema was first brought out in 2001.

What’s the point of METS?

The primary problem which METS answers is that of digital objects which do not comprise a nice simple single entity but which are bound up from many entities or which display behaviour. Examples include a word processed document which includes embedded images, a digitised volume containing many individual scans of pages, and a web page with a variety of Internet file types. If the structural metadata is not captured then these entities just collapse into their component files, and the future user will be unable to reconstruct the original experience. METS is designed to handle digital structures such as these, which belong to a single intellectual entity even though they are comprised of different formats, and even though the components may be stored in separate locations.

METS Objects

METS provides the ability for content either to be stored within the METS file itself, or stored externally in another file and referenced. METS supports wrapping techniques, so that instead of referencing files by (eg) URLs, objects can themselves directly host digital content. This is the distinction between a METS document and a METS object.

The term “METS document” refers to the serialized XML document conforming to the METS schema. By contrast, the term “METS object” refers to the entire digital artifact represented by the METS document, including any externally referenced content or metadata needed to constitute a complete object.

This incorporation of content with metadata into a single METS object enables repositories to create self-contained ‘capsules.’

Elements

A METS document or object comprises seven sections:

METS Header: this contains metadata describing the METS document itself (not the digital object), including such information as creator, editor, etc. Examples might include a statement that the METS document is XML version 1.0 with UTF-8 encoding, enumerate a list of the standards used in the record with the URLs, a human readable label describing the work being encoded, and so on.

Descriptive Metadata Section: this section contains descriptive metadata for all the items in the METS object itself, separately for each item if need be. This includes metadata which is external to the METS document, internally embedded descriptive metadata, or both. Multiple instances of both external and internal descriptive metadata may be included in the descriptive metadata section. Descriptive metadata can be expressed according to many current content standards: MARC, Dublin Core, TEI Header, EAD, or a schema you have made up yourself – METS defines no metadata for you.

An Internal Descriptive Metadata element provides a wrapper around metadata embedded within a METS document. Such metadata can be in one of two forms: either XML-encoded metadata, with the XML-encoding identifying itself as belonging to a namespace other than the METS document namespace; or any other arbitrary binary or textual form, provided that the metadata is Base64 encoded and wrapped in a element within the internal descriptive metadata element.

Administrative Metadata Section: this comprises information about how the files were created and stored, IPR, metadata regarding the original source object from which the digital object was derived, information regarding the provenance of the files that comprise the object, etc. As with descriptive metadata, the administrative metadata can be either external to the METS document, or encoded internally.

File Section: a list of all files that contain content which make up the electronic versions of the digital object, together with their locations. File elements may be grouped within File Group elements, to provide for subdividing the files by object version or other criteria such as file type, size etc.

Structural Map: the core of the METS document. It outlines a hierarchical structure for the digital object, and links the elements of that structure to content files and metadata that pertain to each element. The structural map is mandatory in a METS document.

Structural Links: this section allows the creator of the METS document to record the existence of hyperlinks between nodes in the hierarchy outlined in the Structural Map.

Behavior Section: used to associate executable behaviors with the content of the object. Each behavior has an interface definition element that represents an abstract definition of the behaviors represented. Each behavior also has a mechanism element that identifies a module of executable code that implements and runs the behaviors defined by the interface definition. METS allows us to specify relations for the order of processing, such as sequentially or concurrently.

METS Profiles

METS Profiles are XML documents which describe a class of METS documents in sufficient detail to provide authors and programmers the guidance they require to create and process METS documents. A profile expresses the requirements that a METS document must satisfy. A sufficiently explicit METS Profile may be considered a data standard.

Thoughts

METS is a remarkable feat of XML, but it must result in absolutely enormous XML files, especially if you create METS ojects with embedded content data. Perhaps this is why some repositories zip them up, as Sous la poussiere commented here? – the ZIP format creates issues of its own, of course.

Sources: Borghoff et al, Long Term Preservation of Digital Documents  (this includes an example METS document); Deegan and Tanner, Digital Preservation; the online METS Primer and Reference Manual; Wikipedia.

Advertisements