Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Data Compression Technique for the Elimination of Repeating Byte Strings

IP.com Disclosure Number: IPCOM000080549D
Original Publication Date: 1974-Jan-01
Included in the Prior Art Database: 2005-Feb-27
Document File: 4 page(s) / 57K

Publishing Venue

IBM

Related People

Ojalvo, M: AUTHOR

Abstract

A general purpose, data-independent data-compression technique is described which may be used for (1) storing data efficiently and (2) decreasing I/O time with respect to secondary storage devices and/or telecommunications. The technique eliminates repeating byte strings (RBS's) by using a minimum of executed instructions, and substitutes 3-byte deletion codes (DC's). An RBS is defined here as any string of 4 to 255 like, contiguous bytes. Although most RBS's consist of 0's or blanks, any one of the 256 different (8-bit) bytes are equal candidates for compression. No translation tables are needed. The same buffer is used as both the source and target buffer during both compression and expansion.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 42% of the total text.

Page 1 of 4

Data Compression Technique for the Elimination of Repeating Byte Strings

A general purpose, data-independent data-compression technique is described which may be used for (1) storing data efficiently and (2) decreasing I/O time with respect to secondary storage devices and/or telecommunications. The technique eliminates repeating byte strings (RBS's) by using a minimum of executed instructions, and substitutes 3-byte deletion codes (DC's). An RBS is defined here as any string of 4 to 255 like, contiguous bytes. Although most RBS's consist of 0's or blanks, any one of the 256 different (8-bit) bytes are equal candidates for compression. No translation tables are needed. The same buffer is used as both the source and target buffer during both compression and expansion.

The technique may be subroutinized and added to an existing program or access method. If added to an access method, it would be transparent to the user. The compression subroutine can be used to compress the original data just before writing or transmitting it; the expansion subroutine can be used to expand the compressed data back to its original form (bit-for-bit) just after reading or receiving it. Also, this technique may be combined with another compression technique (or a scrambling technique) to obtain a higher compression ratio than would be possible by either technique alone. Thus, e.g., technique #1, then technique #2 would be used for compression; technique #2, then technique #1 for expansion.

Let the original (physical) record be represented as in Fig. 1 where X,Y,F and 0 each represent a single byte. N(0) = byte count of the original record, including 4 bytes of HI. 0 = unused byte. Y......Y = original data (before compression). F =
1-byte compression flag; initially 0.

Let the compressed record be represented as in Fig. 2, where X,Y, F,D,C and B each represents a single byte.

Nc = byte count of the message after compression, including 4 bytes of HI.

F = compression flag, a 2-valued variable indicating whether this message has been compressed.

Y......Y = nonrepeating byte string (NRBS) whose length varies from 0 to 254, with the possible exception of the last NRBS,

which may contain more. By definition a NRBS does not

contain a RBS.

D(I) = the initial byte displacement; the number of bytes in the NRBS between D(I) and D(1)C(1)B(1).

D(i)C(i)B(i) = 3-byte deletion code (DC) representing an RBS that has been deleted, where i=1, 2, ... , representing

the 1st, 2nd ....last such DC.

D(i) = byte displacement; the number of bytes in the NRBS between D(i)B(i)C(i) and D(i+1)C(i+1)B(i+1). It varies from 0 to 254, except if this is the last deletion code, in which case it has a value of 255.

C(i) = byte count of the deleted repeating byte string; it varies from 4 to 255.

1

Page 2 of 4

B(i) = the actual byte deleted, which may be any one of 256 different bytes.

The description below assumes a byte-addressing facility, such as in the IBM system 360 and 370 series of com...