Huffman Code Approximation for Run Length Compression of Information / Management Index Data Set Entries
Original Publication Date: 1994-Aug-01
Included in the Prior Art Database: 2005-Mar-27
A method is described for an approximation of Huffman encoding of binary data. The approximation streamlines the programming logic required for the compression/uncompression routines.
Huffman Code Approximation for Run Length Compression of
/ Management Index Data Set Entries
A method is
described for an approximation of Huffman encoding
of binary data. The approximation streamlines the programming logic
required for the compression/uncompression routines.
Information/Management Index Entries - These variable length
entries are composed of ones and zeros and can be up to two million
bits long. With this particular data, it is not uncommon to have a
run of one million zeros or ones in a row. As a result, the state
space for run lengths is very large; i.e., the integers from one to
two million or more. This makes it infeasible to attempt to
determine the actual probability space, and impractical to manage,
from a programming standpoint.
Approximation - First, the full range of run lengths, in
ascending order of magnitude, is partitioned into groups of size
2**n, where n can vary from group to group. In the compressed bit
stream, the group to which the current run belongs is determined by a
self-identifying Huffman code, followed by n bits that uniquely
represent the members of that group (these bits can be considered the
remainder of the Huffman code). The size of each group, as well as
the Huffman code used to flag the group, is dependent upon the
probability distribution of the runs. Usually, the shorter the run,
the greater the likelihood of its occurrence, the smaller its group,
and the shorter its Huffman code.
Here is an illustration: