The Document Architecture for the Cornell Digital Library (RFC1691)
Original Publication Date: 1994-Aug-01
Included in the Prior Art Database: 2000-Sep-12
Internet Society Requests For Comment (RFCs)
This memo defines an architecture for the storage and retrieval of the digital representations for books, journals, photographic images, etc., which are collected in a large organized digital library.
Network Working Group W. Turner
Request for Comments: 1691 LTD
Category: Informational August 1994
The Document Architecture for the Cornell Digital Library
Status of this Memo
This memo provides information for the Internet community. This memo
does not specify an Internet standard of any kind. Distribution of
this memo is unlimited.
This memo defines an architecture for the storage and retrieval of
the digital representations for books, journals, photographic images,
etc., which are collected in a large organized digital library.
Two unique features of this architecture are the ability to generate
reference documents and the ability to create multiple views of a
In 1989, Cornell University and Xerox Corporation, with support from
the Commission on Preservation and Access and later Sun Microsystems,
embarked on a collaborative project to study and to prototype the
application of digital technologies for the preservation of library
material. During this project, Xerox developed the College Library
Access and Storage System (CLASS), and Cornell developed software to
provide network access to the CLASS Digital Library.
Xerox and Cornell University Library staff worked closely together to
define requirements for storing both low- and high-resolution
versions of images, so that the low-resolution images could be used
for browsing over the network and the high-resolution images could be
used for printing. In addition, substantial work was done to define
documents with internal structures that could be navigated. Xerox
developed the software to create and store documents, while Cornell
developed complementary software to allow library users to browse the
documents and request printed copies over the network.
Cornell has defined a document architecture which builds on the
lessons learned in the CLASS project, and is maintaining digital
library materials in that form.
Document Architecture Overview
Just as a conventional library contains books rather than pages, so
the electronic library must contain documents rather than images.
During the scanning process, images are automatically linked into
documents by creating document structure files which order the image
files in the same way the binding of a book orders the pages. Thus,
the digital book as currently configured consists of two parts: a set
of individual pages stored as discrete bit map image files, and the
document structure files which "bind" the image files into a
document. In addition, a database entry is made for each digital
document which permits searching by author and title (i.e.,
bibliographic information). Beyond the order of the pages, the
arrangement of a physical book provides information to readers. The
title page and publication information co...