4 Archival information package - Reference Documentation

Authors: Lucien van Wouw

Version: 1.4

4 Archival information package

The archival package is complete once the ingest workflow applied all it's tasks. At a minimum the package consists of at least two documents: the file to be preserved and it's metadata.

4.1 Preservation Description Information

This document consists of the following archival elements:

md5numberThe md5 checksum for the stored file
lengthnumberThe length in bytes of the file
filenamestringoriginal file
contentTypestringThe mimetype
metadata.pidstringThe persistent identifier
metadata.pidTypestringThe indicator of the type of pid resolver
metadata.resolverBaseUrlstringThe base URL to prefix the metadata.pid value with
metadata.objidstringThe object identifier or group identifier
metadata.seqnumberThe physical order of the file vis a vis other files under the shared metadata.objid
metadata.contentjsonThe fingerprint of the content. Such as width, length, resolution, etc.
metadata.accessstringAccess status of this file and all
metadata.fileSetstringthe location of the file in the dissemination package
metadata.lstringthe relative location of the file in the dissemination package
metadata.labelstringthe label for the dissemination package
metadata.firstUploadDatedateThe date the file with this metadata.pid value was first uploaded
uploadDatedateThe last date this file was uploaded.

For example, an image:

"_id" : NumberLong(12345),
"contentType" : "image/jpeg",
"filename" : "myfile.TIFF",
"length" : NumberLong(10058),
"md5" : "f1c8b344033c30f1670626b087b607bc",
"metadata" : {
	"access" : "restricted",
	"content" : {
		"x-resolution" : 72,
		"y-resolution" : 72,
		"width" : 368,
		"height" : 313
	"fileSet" : "/data/stagingarea/12345/12347/2007-08-27",
	"firstUploadDate" : ISODate("2012-06-27T15:15:57.957Z"),
	"l" : "/2007-08-27/30051/00/013",
	"label" : "2012-06-27 batch filer4",
	"pid" : "12345/30051000131778",
	"pidType" : "or",
	"resolverBaseUrl" : "http://hdl.handle.net/",
"uploadDate" : ISODate("2007-08-27T14:30:00Z")

4.2 Content Data Object

The file stream itself is stored in one or more chunked documents. Each document has a checksum of its own.

files_idnumberthe local identifier
nnumbersequence of the chunk
dataBase64file stream
md5stringchecksum of the chunk

For example, for the earlier image:

"_id" : ObjectId(12345),
"files_id" : NumberLong(12345),
"n", 0,
"data" : BinData(0,"/9j/4AAQSkIdXbV (...snip...) Na8SalaRajGVUoC0ig=="),
"md5" : "f1c8b344033c30f1670626b087b607bc"