Browse Prior Art Database

Huffman Coding without Bit-Manipulation

IP.com Disclosure Number: IPCOM000148869D
Original Publication Date: 1986-Mar-31
Included in the Prior Art Database: 2007-Mar-30
Document File: 22 page(s) / 1M

Publishing Venue

Software Patent Institute

Related People

Choueka, Y.: AUTHOR [+5]

Abstract

Huffman Coding without Bit-Manipulat ion Y. Chouekal, A.S. Fraenke12, S.T. ~leinl*~,

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 8% of the total text.

Page 1 of 22

[This page contains 1 picture or other non-text object]

Page 2 of 22

[This page contains 1 picture or other non-text object]

Page 3 of 22

   Huffman Coding without Bit-Manipulat ion

Y. Chouekal, A.S. Fraenke12, S.T. ~leinl*~,

Dept. of Math. and Computer Science, Bar-Ilan University, Ramat Gan 52100, Israel and Inst. for Information Retrieval and Computational Linguistics (IRCOL) -

The Responsa Project

Department of Applied Mathematics, The Weizmann Institute of Science, Rehovot 76100, Israel Computer and Information Science Dept., N.J. Institute of Technology, Newark, NJ 07102, USA

Part of this work was done while the three last authors were partially affiliated with IRCOL

A first draft of this paper was presented at the Annual International ACM-SIGIR Conference (Montreal, Canada, 1985), and appeared in its proceedings, pp. 122-130.

February 1986

ABSTRACT

   Although it is well-known that Huffman Codes are optimal for text corn- pression in a character-per-character encoding scheme, they are seldom used in practical situations since they require a bit-per-bit decoding algorithm, which has to be written is some assembly language, and will perform rather slowly. A number of methods are presented that avoid these difficulties. The decoding algorithms efficiently process the encoded string on a byte-per-byte basis, are faster than the original algorithm, and can be programmed in any high level language. This is achieved at the cost of storing some tables in the internal memory, but with no loss in the compression savings of the optimal Huffman codes. The internal memory space needed can be reduced either at the cost of increased processing time, or by using non-binary Huffman codes, which give slightly sub-optimal compression. Experimental results for English and Hebrew texts are also presented.


Y. Per11y3

[This page contains 1 picture or other non-text object]

Page 4 of 22

Introduction and Motivation

   In most on-line retrieval systems, large files which are stored in secondary mem- ory are frequently accessed. These files should be stored in compacted form, not only to save space, but also to reduce the response-time: the time spent on de- compressing is generally largely compensated for by the savings in the number of I/O-accesses, since more information can be read in a single input operation. A given file can be compressed by exploiting its redundancies. For example, if the file is a text written in any natural language, one can devise a special code for its characters, taking into account their distribution and assigning shorter codewords to frequently used letters than to rare ones.

   An algorithm for the construction of an optimal code for a given distribution was devised by Huffman [7]. An efficient implementation of the encoding algorithm in time O(N log N), where N is the size of the encoded alphabet, is given in...