Browse Prior Art Database

Compact Encodings of List Structure

IP.com Disclosure Number: IPCOM000128888D
Original Publication Date: 1979-Dec-31
Included in the Prior Art Database: 2005-Sep-20
Document File: 19 page(s) / 64K

Publishing Venue

Software Patent Institute

Related People

Daniel G. Bobrow: AUTHOR [+4]

Abstract

Most implementations of list structures use fields of uniform size to point to data and to link the structure. An empirical study of the use of list structure in Lisp [Cla77] indicates that certain values of these fields are used far more often than others. This non-uniformity can be exploited to obtain compact encodings of list structure without sacrificing their generality in representing structured data. In these encodings a small field is used to represent the most common values, and an escape mechanism provides access to full-size pointers when they are needed. In this paper we explore this idea in detail by constructing and evaluating several encoding schemes and escape mechanisms. Although we discuss the execution time costs of these designs, we concentrate in this paper on careful evaluations of their savings in space. The empirical data upon which these encoding designs and evaluations are based are reported in full in [Cla77, Cla79]. We summarize some of those results here. The list structures in existence at the end of a typical run of five large Lisp programs (about 50,000 cells each) were measured to detennine the frequency of occurrence of pointer values in car (usually the data field) and cdr (usually the link field) of each list cell. Some related dynamic measurements were also made, both to determine the relative frequency and arguments of basic operations, and to verify that list structures measured at the end of the run were typical of those existing at various points during the run. The structures examined were just those used as data for the programs, and not those used to represent the programs themselves. Deutsch has shown that great coding efficiency can be obtained for programs by compiling them into a carefully designed instruction set [Deu73b]; therefore, we will not address the problem of compacting program list structure in this paper. Our data came from the following five programs: CONGEN (called "STRGEN" in [Cla771), a chemical structure generator [Smi74]; NOAH, a planning program [Sac77]; PIVOT, a program verifier [Deu73a]; SPARSER, the parser in a speech understanding system [Bat75]; and WIRE, a wire-listing program used at Xerox PARC. All five are written in Interlisp [Tei74], a sophisticated Lisp system running on the PDP-10 computer under the Tenex operating system [Bob72]. Table I shows how pointers in car and cdr were distributed among the data types of Interlisp. Atoms are Lisp's symbols or identifiers, and small integers are those in the range 1536. NIL is a special atom normally used in cdr to indicate the end of a list. There is considerable agreement among the programs, especially in cdr. [Figure containing following caption omitted: TABLE I Distribution of data types in list cells (numbers are percentages)] List pointers most often pointed only a short distance away in the address space of list cells. This is shown in Table II. In both car and cdr, an offset of one was most common, and there was a strong decline as distance increased. This is a reflection of the fact that free lists tend to start out ordered, and cells that point to each other are most often created close in time [Cla77]. This very strong regularity is the reason that we will most often represent list pointers as cell-relative offsets in the encodings of this paper. Some encodings will use page-relative offsets, which also benefit from the regularity shown in Table II.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 6% of the total text.

Page 1 of 19

THIS DOCUMENT IS AN APPROXIMATE REPRESENTATION OF THE ORIGINAL.

Compact Encodings of List Structure

by Daniel G. Bobrow and Douglas W. Clark

CSL-79-7 JUNE1979

Abstract: List structures provide a general mechanism for representing easily changed structured data, but can introduce inefficiencies in the use of space when fields of uniform size are used to contain pointers to data and to linkthe structure. Empirically determined regularity can be exploited to provide more space efficient encodings without losing the flexibility inherent in list structures. The basic scheme is to provide compact pointer fields big enough to accommodate most values that. occur in them, and to provide "escape" mechanisms for exceptional cases. Several examples of encoding designs are presented and evaluated, including two designs currently used in Lisp machines. Alternative escape mechanisms are described, and various questions of cost and implementation are discussed. In order to extrapolate our results to larger systems than those measured, we propose a. model for the generation of list pointers, and test the model against data from two programs. We show that according to our model, list structures with compact cdr fields will, as address space grows, continue to be compacted well with a fixed width small field. Our conclusion is that with a microcodable processor, about a factor of two gain in. space efficiency for list structure can be had for little or no cost in processing time.

CR Categories: 3.69, 4.34, 4.6, 5.6

Key words and phrases: compact encoding, list structure, Lisp, list structure regularity, linearization, Lisp machine

To appear in ACM Transactions on Programming Languages and Systems

33-33 Coyote Hill Road / Palo Alto / California 94304

1. Introduction

Most implementations of list structures use fields of uniform size to point to data and to link the structure. An empirical study of the use of list structure in Lisp [Cla77] indicates that certain values of these fields are used far more often than others. This non-uniformity can be exploited to obtain compact encodings of list structure without sacrificing their generality in representing structured data. In these encodings a small field is used to represent the most common values, and an escape mechanism provides access to full-size pointers when they are needed. In this paper we explore this idea in detail by constructing and evaluating several encoding schemes and escape mechanisms. Although we discuss the execution time costs of these designs, we concentrate in this paper on careful evaluations of their savings in space.

The empirical data upon which these encoding designs and evaluations are based are reported in full in [Cla77, Cla79]. We summarize some of those results here. The list structures in existence at the end of a typical run of five large Lisp programs (about 50,000 cells each) were measured to detennine the frequency of occurrence of pointer values in car (usually the...