Huffman Coding without Bit-Manipulation
Original Publication Date: 1986-Mar-31
Included in the Prior Art Database: 2007-Mar-30
Software Patent Institute
Choueka, Y.: AUTHOR [+5]
Huffman Coding without Bit-Manipulat ion Y. Chouekal, A.S. Fraenke12, S.T. ~leinl*~,
Huffman Coding without Bit-Manipulat ion
Y. Chouekal, A.S. Fraenke12, S.T. ~leinl*~,
Dept. of Math. and Computer Science, Bar-Ilan University, Ramat Gan 52100, Israel and Inst. for Information Retrieval and Computational Linguistics (IRCOL) -
The Responsa Project
Department of Applied Mathematics, The Weizmann Institute of Science, Rehovot 76100, Israel Computer and Information Science Dept., N.J. Institute of Technology, Newark, NJ 07102, USA
Part of this work was done while the three last authors were partially affiliated with IRCOL
A first draft of this paper was presented at the Annual International ACM-SIGIR Conference (Montreal, Canada, 1985), and appeared in its proceedings, pp. 122-130.
Although it is well-known that Huffman Codes are optimal for text corn- pression in a character-per-character encoding scheme, they are seldom used in practical situations since they require a bit-per-bit decoding algorithm, which has to be written is some assembly language, and will perform rather slowly. A number of methods are presented that avoid these difficulties. The decoding algorithms efficiently process the encoded string on a byte-per-byte basis, are faster than the original algorithm, and can be programmed in any high level language. This is achieved at the cost of storing some tables in the internal memory, but with no loss in the compression savings of the optimal Huffman codes. The internal memory space needed can be reduced either at the cost of increased processing time, or by using non-binary Huffman codes, which give slightly sub-optimal compression. Experimental results for English and Hebrew texts are also presented.
Introduction and Motivation
In most on-line retrieval systems, large files which are stored in secondary mem- ory are frequently accessed. These files should be stored in compacted form, not only to save space, but also to reduce the response-time: the time spent on de- compressing is generally largely compensated for by the savings in the number of I/O-accesses, since more information can be read in a single input operation. A given file can be compressed by exploiting its redundancies. For example, if the file is a text written in any natural language, one can devise a special code for its characters, taking into account their distribution and assigning shorter codewords to frequently used letters than to rare ones.
An algorithm for the construction of an optimal code for a given distribution was devised by Huffman . An efficient implementation of the encoding algorithm in time O(N log N), where N is the size of the encoded alphabet, is given in...