The Document Architecture for the Cornell Digital Library (RFC1691)
Original Publication Date: 1994-Aug-01
Included in the Prior Art Database: 2019-Feb-12
Publishing Venue
Internet Society Requests For Comment (RFCs)
Related People
Related Documents
Abstract
This memo defines an architecture for the storage and retrieval of the digital representations for books, journals, photographic images, etc., which are collected in a large organized digital library. This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind.
Network Working Group W. Turner Request for Comments: 1691 LTD Category: Informational August 1994
The Document Architecture for the Cornell Digital Library
Status of this Memo
This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited.
Abstract
This memo defines an architecture for the storage and retrieval of the digital representations for books, journals, photographic images, etc., which are collected in a large organized digital library.
Two unique features of this architecture are the ability to generate reference documents and the ability to create multiple views of a document.
Introduction
In 1989, Cornell University and Xerox Corporation, with support from the Commission on Preservation and Access and later Sun Microsystems, embarked on a collaborative project to study and to prototype the application of digital technologies for the preservation of library material. During this project, Xerox developed the College Library Access and Storage System (CLASS), and Cornell developed software to provide network access to the CLASS Digital Library.
Xerox and Cornell University Library staff worked closely together to define requirements for storing both low- and high-resolution versions of images, so that the low-resolution images could be used for browsing over the network and the high-resolution images could be used for printing. In addition, substantial work was done to define documents with internal structures that could be navigated. Xerox developed the software to create and store documents, while Cornell developed complementary software to allow library users to browse the documents and request printed copies over the network.
Cornell has defined a document architecture which builds on the lessons learned in the CLASS project, and is maintaining digital library materials in that form.
Turner [Page 1]
RFC 1691 CDL Document Architecture August 1994
Document Architecture Overview
Just as a conventional library contains books rather than pages, so the electronic library must contain documents rather than images. During the scanning process, images are automatically linked into documents by creating document structure files which order the image files in the same way the binding of a book orders the pages. Thus, the digital book as currently configured consists of two parts: a set of individual pages stored as discrete bit map image files, and the document structure files which "bind" the image files into a document. In addition, a database entry is made for each digital document which permits searching by author and title (i.e., bibliographic information). Beyond the order of the pages, the arrangement of a physical book provides information to readers. The title page and publication information come first; the table of contents usually precedes the text; the text is divided into sections or chapters; if there is an index, it follows the t...
