Browse Prior Art Database

Data Compaction With Minimized Padding and Reduction of Lost Records

IP.com Disclosure Number: IPCOM000077787D
Original Publication Date: 1972-Sep-01
Included in the Prior Art Database: 2005-Feb-25
Document File: 5 page(s) / 78K

Publishing Venue

IBM

Related People

Herzog, A: AUTHOR [+3]

Abstract

The purpose of all data compacting schemes is to save storage space used for the normal representation of the stored data. Yet all of them must waste some small number of storage bytes, because of the unpredictable length of compacted data. The algorithm herein holds this waste to a negligible, irreducible amount. In addition, it provides controls for minimizing the number of lost records due to uncorrectable input/output errors on compacted data blocks.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 35% of the total text.

Page 1 of 5

Data Compaction With Minimized Padding and Reduction of Lost Records

The purpose of all data compacting schemes is to save storage space used for the normal representation of the stored data. Yet all of them must waste some small number of storage bytes, because of the unpredictable length of compacted data. The algorithm herein holds this waste to a negligible, irreducible amount. In addition, it provides controls for minimizing the number of lost records due to uncorrectable input/output errors on compacted data blocks.

Assumptions: For the scheme described herein, assume that there exist two CPU instructions for compacting and expanding a series of characters or bits, respectively. Let the two instructions be called COMPACT and EXPAND, and assume the following operation.

For each instruction, the two buffers for compacted and for uncompacted data must start on a full-word boundary and must not overlap. Each instruction requires, further, the address of each buffer ("source" and "target"). COMPACT requires the length of the source and the target buffers, and EXPAND requires a target count of characters (bytes). Each of the two operations proceeds serially in ascending address sequence within source and target.

The number of bits resulting from compaction of a series of bytes is unpredictable. When the source count of bytes to be compacted is depleted, compaction may very well end within a byte. In this case, the COMPACT instruction pads to the next full word. The COMPACT instruction program checks with a nonzero condition code, if it reaches the end of the target buffer before having depleted the source count. Finally, each instruction returns some indication of the number of source bytes processed whenever they reach their target limit.

For efficient storage and retrieval of compacted data, and for convenience in buffer handling, target buffers are of fixed size. Also, assume that uncompacted data is handled in fixed storage units (blocks).

Detailed Description of the Buffer Handling/Control Scheme:

Since most of the operation can be explained for compaction, "source" will in most cases denote uncompacted data, and "target" will denote compacted data or buffers that will hold compacted code.

Let each source buffer E have fixed length e (excluding the Access method's storage management control fields), and let each target buffer C be c-bytes long (excluding access method and compaction control fields). Source and target begin at word boundaries, and lengths e and c are word multiples.

When compacting data, one proceeds by directing the COMPACT instruction in a modified, segmented stream-to-stream fashion from all (as yet uncompacted) bytes of a source buffer to the entire (remaining word-multiple) bytes of the target buffer. The segmentation occurs, because the number of target area bytes resulting from compaction of a known number of source bytes is unpredictable at any one time.

1

Page 2 of 5

Cases of segmentation caused by te...