Browse Prior Art Database

Data Compression Method for Nonrandom Character Sequences

IP.com Disclosure Number: IPCOM000079284D
Original Publication Date: 1973-Jun-01
Included in the Prior Art Database: 2005-Feb-26
Document File: 3 page(s) / 64K

Publishing Venue

IBM

Related People

Sanborn, JL: AUTHOR

Abstract

Described is an efficient process for a one-byte encoding of repetitive characters, which includes error detection on the original input. The process is applicable to nonrandom character sequences where a set of valid characters not only have a high probability of occurring but also, given a first occurrence for one of the characters, there is a good probability that the same character will be immediately repeated. One example might be a logic diagram stored in printable form, where the application of the process greatly reduces the data space required. For this application only two characters might be encoded, blanks and vertical bars, but the process is extendable to additional characters.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 59% of the total text.

Page 1 of 3

Data Compression Method for Nonrandom Character Sequences

Described is an efficient process for a one-byte encoding of repetitive characters, which includes error detection on the original input. The process is applicable to nonrandom character sequences where a set of valid characters not only have a high probability of occurring but also, given a first occurrence for one of the characters, there is a good probability that the same character will be immediately repeated. One example might be a logic diagram stored in printable form, where the application of the process greatly reduces the data space required. For this application only two characters might be encoded, blanks and vertical bars, but the process is extendable to additional characters.

The underlying mechanism of the process is to separate the bytes, after packing, into three categories: an encoded character, implying exactly one repetition; a unique error condition character, used to identify an unexpected character in the input stream; and a set of encoding bytes which are easily distinguished from either of the other two categories. The method used for this separation is to map the set of valid characters not to be encoded for repetitions, together with all invalid characters into a high, contiguous range of values in the byte. For example, if there are 63 valid, nonencoded characters, they could be mapped into the hexadecimal range of `C1' to `FF', and all error conditions mapped into `C0'. The remaining range of `00' to `BF' could then be used for encoded repetitions.

Each byte used to encode repetitions is separated into two subfields. The first subfield identifies the character being repeated, and the second subfield gives the number of repetitions. The length of these subfields is dependent on the number of valid, nonencoded characters and the number of characters which are candidates for repetition encoding. These two factors also determine the maximum number of repetitions which can be contained in a single-encoding byte. In the table below, examples of several different combinations are given.

Number of Characters Subfield Length in Bits Max. Reps./ Valid, Non-Encoded First Second Encoding Bytes Encoded 61 3 2 6 63

61 6 3 5 31

91 2 2 6 63

91 4 3 5 31

127 1 1 7 127

127 4 3 5 31.

Since the encoding method utilizes a single byte to represent both the character being repeated and the number of repetitions, there is no loss in data space if a particular occurrence of the character is a single repetition. There is, therefore, no look-ahead associated with the method. If the number of repetitions in a sequence exceeds the maximum number which can be contained in a single-encoding byte, a second-encoding byte is used.

1

Page 2 of 3

An example of the process is shown in the attached flow charts, where the encoding is shown in Chart 1, and the decoding is shown in...