A System and Method for Managing Representation Information in a Preservation Data System
Publication Date: 2009-May-19
In order for stored data in a Long Term Digital Preservation System to remain understandable for long periods of time (longer than 10 years), representation information that provides means to interpret the data must be stored together with the data. In this article we describe a novel Representation Information Management System designed for long-term digital preservation that is self-contained and efficient.
In a Long Term Digital Preservation (LTDP) system (e.g., OAIS standard ) it is required that the saved data remain understandable. To be able to interpret the raw Content Data Object (CDO), Representation Information (Content RepInfo) must be supplied and saved together with the CDO. However, the relevant Content RepInfo needs also to be interpretable, hence, it has associated with it (next level) Content RepInfos, and so on for additional next levels of RepInfos.
A RepInfo Management
System (RIMS) is needed that will be self-contained (include all the relevant interpretation information for the stored objects within itself). Disclosed is a novel RIMS. We are not aware of a prior art describing an existing RIMS solution for a LTDP system.
A productive Representation Information Management System should be
simple, efficient and scalable, to support large-scale preservation systems. In the following we describe our invention for a RIMS for a Long Term Digital Preservation system in terms of an OAIS based preservation system by way of example. The invention, however, is of a general nature and is equally applicable to other, non-OAIS based preservation systems.
In a preservation system, typically, a basic object is defined,
designated to be stored in the archival storage of the preservation system. For
example, in an OAIS system an Archival Information Package (
basic object is
defined to hold the CDO and additional metadata (in OAIS the latter is termed Preservation Descriptive Information - PDI).
A designated part of the AIP serves to
hold or point to the RepInfo of the CDO (termed here the RepInfo Section of the CDO). Similarly, another designated part of the AIP serves to hold or point to the RepInfo of the PDI. In what follows we will use the OAIS/
AIP structures by way of
Consequently, a RepInfo item contains all the AIP sections, including, in the example of OAIS, its own RepInfo sections (for the CDO RepInfo and for the PDI RepInfo). Since many AIPs may have similar RepInfos, and since there is typically a large body of RepInfos,
we wish to
avoid RepInfo duplication and allow sharing. The simplest solution is to hold pointers
within the RepInfo sections of AIPs that will point to
(and thus possibly share)
RepInfos: by pointing to the same RepInfos from many AIPs sharing is achieved.
The main idea is to extend this simple solution to allow more efficient sharing and usage of RepInfos in RIMS. To this end we organize all the Content RepInfos stored in the preservation system into groups of RepInfos (termed RepInfo Categories) that group together references to many representation information items. The grouping is based on similarity of RepInfo contents and meaning based on a set of predefined criteria. For example, all Content RepInfos pertaining to software editors that can handle ASCII text...