Browse Prior Art Database

String Compaction Algorithm

IP.com Disclosure Number: IPCOM000085720D
Original Publication Date: 1976-May-01
Included in the Prior Art Database: 2005-Mar-02
Document File: 2 page(s) / 14K

Publishing Venue

IBM

Related People

Dishon, Y: AUTHOR

Abstract

A method is described for compaction encoding runs of fixed-length characters from a finite source alphabet. The method presumes that the length of character runs is Huffman ordered, that is, short runs are more frequent than long runs, and some characters occur more frequently in runs than others. The method comprises the following steps:

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

String Compaction Algorithm

A method is described for compaction encoding runs of fixed-length characters from a finite source alphabet. The method presumes that the length of character runs is Huffman ordered, that is, short runs are more frequent than long runs, and some characters occur more frequently in runs than others. The method comprises the following steps:

1) Encoding the source code such that the most frequent characters occurring in strings, comprising one half of the source code alphabet, into one set, A, where each reencoded character maintains the same number of bits as were in the source alphabet, but assigning to the most significant bit (MSB) of each character in the set the same binary digit of one kind, say 1; encoding the least frequent characters occurring in strings; comprising the remaining half of the source code alphabet, into another set, B, where each reencoded character maintains the same number of bits as were in the source alphabet, but assigning to the MSB of each character in the set the same binary digit of one kind, but opposite to that of set A, say 0.

2) Defining three code states, namely, Code A state, Code B state and Neutral state, Code A or Code B state to be entered only from the Neutral state upon detection in the source data stream k contiguous characters (k a small number) belonging, respectively, all to set A or all to set B. The Neutral state to be entered only at the beginning of the data stream, or else if a character belonging to set B while in state A or belonging to set A while in state B is detected in the data stream.

Entrance into either state A or B is implied by occurrence of the k characters without further emittance of a "switch" character, while entrance into the Neutral state (except at beginning of the data stream) is signalled by emittance of a switch character unique to each of set A and set B.

3) In state A or B encoding runs of characters of length one without further change, runs of length two by inverting the MSB of the first character of the run and not emitting the second character, and runs of length three or more by inverting the MSB of the first character, thus signalling the presence of the first two characters of the run, and substituting a character containing the character count for the remaining length of the run. This count character, too, has its MSB inverted relative to the MSB of the set (say, 0 for A, 1 for B) and is thus recognized as a count character, while the remaining bits contain the count information. As many count characters as necessary may be appended contiguously t...