(Quick Reference)

2 Submission information package - Reference Documentation

Authors: Lucien van Wouw

Version: 1.4

2 Submission information package

The stagingarea is where it all begins. Here the content producer - possibly through the agent of a digital archivist - assembles the Digital files and metadata needed for the archival and dissemination package. This information can be delivered via ftp for example, or by disk where an operator placed the content onto the staging area.

The manifest for this submission packaged is called a processing instruction. It declares the files placed on the staging area and contains related metadata. This instruction can be custom made by the producer's own tools; or be partly or entirely manufactured inside the object repository administration panels.

The assembly of an submission package - and the archival ingest - can be completely automated or performed manually step-by-step.

2.1 Files and their delivery

2.1.1 Files for preservation

Digital data that is intended to be preserved is designated as a "master". These are files whose content is to be stored perpetually. Such data is kept in a way to allow for media migration; and the automatic production of derivatives. The data can be of any content type: audio, video, image, text, software. Simply put: a file with a name that has one or more bytes.

2.1.2 Files for presentation

Derivatives and other data are used for presentational purposes as opposed to emitting the master files in the dissemination package. The Content Producer can choose to supply any type of custom data. For example png images to offer a preview image for an audio master file. The files intended for presentation should be regarded as temporal in nature and may be deleted or replaced with different content all the time. The object repository has services to generate derivative content from the master as well. Whether custom derivatives are supplied or automatically generated, their use is entirely optional.

2.2 Metadata

Metadata concern the files that are submitted for ingest; and also the operations needed to determine how these files are processed at ingest time in order to assemble the desired archival package. Not all metadata needs to be supplied. The ingest procedure will add new content metadata about the stored media. For example image resolution, frames per second for movies. Metadata is expressed via a processing instruction. The latter can be defined by the content producer via an XML document and through the administration panel.

2.2.1 Technical metadata

Metadata about the files is purely technical, administrative and structural in nature. A minimum amount of it is required for assembling a submission package:
  • Access policies to indicate the degrees of availability to the content consumer.
  • A content type to identify the digital files.
  • Fixity: a MD5 checksum to ensure file transport integrity when it was uploaded from the producer into the archival package.
  • Reference: persistent identifiers to allow for access durability between information systems.
  • Optional: a content producer may provide context information such as file order to describe physical relations ( pages of a book for example ) together with an optional reference to a bundled set of files.

2.2.2 Operational metadata

The submission packaged needs additional instructions to determine which services are invoked during ingest. Such services involve amongst others:
  • which files to add, delete or update
  • if derivatives should be created
  • if Persistent identifiers are to be created or bound

2.2.3 No descriptive metadata

Descriptive metadata such as title, subject, or provenance is not part of the architecture of the object repository. The latter domain is about delivery of digital content and does facilitate discovery of content via a search-and-find solution. The domain of descriptive metadata storage remains that of catalog, archival and library systems and related search solutions. The relationship between the two domains is kept by the reference information - persistent identifiers - that is bound to the dissemination package.

2.3 The processing instruction services

Metadata is expressed with processing instructions. Various instruction services will assist the content producer during the assembling of the submission package. These involve both the creation and import of custom made instructions. And the export and validation of those instructions. The processing instruction will take default values from a set Profile. This profile has already all the global metadata properties set such as access policies and the desired ingest services to use.

2.3.1 Autocreate Instruction service

The autocreate instruction service enables the content producer to create a complete and valid processing instruction. The service will calculate the checksum and if desired reserve persistent identifier per file. Any other settings will be inherited from the Profile.

2.3.2 Validate Instruction service

The validation service will ensure data integrity over all stages of the submission phase. When master files are ingested this procedure will look if:
  • all files in the staging area are in fact declared in the instruction as stagingfile elements
  • all stagingfile elements in the instruction are indeed found the main folder content
  • the md5 checksum values match
  • a persistent identifier is present; and not used elsewhere in the instruction
  • the file has at least one byte

If the instruction is used for post-ingest operations (e.g. creation of derivatives, ingest of custom derivatives, re-creation of labels or access policy) whereby an existing archival package is updated, this validation will only check if the persistent identifier is registered.

This service will unlock the ingest service, after it confirmed the instruction is without error.

2.3.3 Import Instruction service

The import service will read in a custom made processing instruction that is placed on the staging area. Provided:
  • it is a well formed XML document;
  • the submission package did not already include an instruction. A new instruction will only be accepted if a current processing instruction is cleared from the system via the administration panel.

After an import, the service will invoke the instructino validation service.

2.3.4 Export Instruction service

A processing instruction can be downloaded while staged and at post-ingest time. For example to add modifications to it outside the administration panel; or to embed it's referential metadata in the producer's local metadata system.

2.3.5 Recreate Instruction service

This service will create an instruction for files that were already ingested and are part of the archival package. This instruction will cover all files that share the same label. Or the individual file by referencing the persistent identifier.