Long Term Preservation through an automatic and learning PLM Cycle.
Publication Date: 2014-Mar-05
The IP.com Prior Art Database
Long-Term Digital Preservation (LTDP) is defined as a set of methods and activities designed for making digital information usable over an indefinite period of time. Since the 1970 decade, the LTDP problem has been subject of substantial research, mostly from public institutions such as national libraries. This public line of research is very well documented in the report . The IBM Academy of Technology has provided another excellent report on LTDP , which refers to CASPAR, but adds other industry and business perspectives. This article is on a sub-problem of LTDP, namely the Rendering Problem (RP). By rendering, we understand the display of a content object, i.e. a static kind of asset, on a computer system’s surface in some usable form. For any given object renderable on a certain software and hardware stack s, RP is to ensure that there is a future stack s' such that a can be rendered on s’. Of course, RP is difficult because standards, formats, descriptive information, authentication information, software and hardware components required for s become obsolete over time, so that a new rendering stack s' is needed. Of course, there is a lattice of preservation difficulty where simple ASCII documents are easy and thinks like video games are difficult. Moreover, in real-world LTDP situations like libraries or large enterprises, there will be very large amounts of assets so that RP needs to be solved in a scalable and cost effective manner.
Page 01 of 8
Long Term Preservation through an automatic and learning PLM Cycle .
1. Prior art
The following approaches to the RP problem have been proposed in recent research. For a more in-depth presentation of these approaches, please refer to .
Conversion of objects means to change the format, format version or packaging of an asset such that it can be rendered with a new or additional application.
Virtualization means to introduce a software layer simulating hardware capabilities, so that some level of hardware obsolescence can be supported through maintaining the virtualization software specification.
Standards oriented approaches rely on widely spread, standardized components in s, mostly asset formats such as, e.g., the PDF/A format, assuming that their obsoleteness occurs later (or not at all) than for proprietary components.
Metadata oriented approaches add descriptive data to an asset including information on how to build the future stack s'.
Encapsulation approaches define a logical wrapper around the asset, with the
intention to make it somewhat independent from a changing system environment.
In spite of their diversity, all of these approaches to the RP make the assumption that there is an invariant within stack s which resists change and serves as a fixed point for
preservation. For conversion approaches, this fixed point is the assumption that an information-preserving, or at least low-error conversion tool exists. For virtualization approaches, the fixed point is the machine specification of the virtualization software. For standards oriented approaches, it is clear that (even sequences of) standards may become obsolete. For metadata oriented approaches, the applicability of metadata to future rendering, i.e. the existence of an interpretation which allows this application, is such a fixed point. Encapsulation approaches have their fixed point in the assumption that the encapsulated asset, by virtue of some closure property, is independent from some technological change. It is easy to see that assuming the fixed points within the software stack invalidates these approaches to some extent, since technological change over decades may remove fixed points. We may claim therefore that none of these approaches is effective, i.e. none of them alone provides a safe foundation for an RP solution.
Existing real-world solutions to the RP are also not scalable in that their resource
consumption is typical linear to the amount of assets being preserved. This is due to the facts that
Manual or intellectual preservation activities are applied to individual assets, with no
or only very little automation occurring across assets and/or stack components,
Page 02 of 8
Preservation knowledge is not systematically shared across different institutions /
projects / collections, even if informal exchange may occur.
Note that these are not properties of the RP approaches mentioned above, but of their respective implementations. Nevertheless,...