Hardware compression method with early and late selection of multiple predefined Huffman code-trees
Publication Date: 2014-Aug-01
The IP.com Prior Art Database
Disclosed is a device and method to optimize the compression of arbitrary data using the deflate standard with a given number of preselected Huffman trees. The invention improves prior art with an optimized Huffman tree selection and optimized dynamic change of the Huffman tree during runtime.
Page 01 of 3
Disclosed is a device and method to optimize the compression of arbitrary data using the deflate  standard with a given number of preselected Huffman trees. The invention improves prior art with an optimized Huffman tree selection and optimized dynamic change of the Huffman tree during runtime.
The DEFLATE standard documented by RFC 1951 is a widely used data compression specification. It comprises two mechanisms
1. Lempel-Ziv LZ77 - Replaces repeating data by length/distance references within a 32 kilobyte rolling window.
2. Huffman Encoding - Allows for either a predefined encoding of data (Fixed Huffman) or a customer defined encoding of data (Dynamic Huffman) to optimize the symbol sizes. Not only data, but also length/distance pairs are encoded with variable length codes, such that frequently used data bytes ("literals") and length/distance pairs are encoded with shorter codes than less frequently used ones.
Commonly used software implementations, such as the well-known open-source zLib  library or the gzip program, analyze blocks of data to determine a suitable set of Huffman codes ("tree"), then compress data with this Huffman tree. The next block of data is then encoded again with another dynamically generated Huffman tree and so forth. The standard also specifies one mandatory predefined Huffman tree, called fixed Huffman. LZ77 matching and statistically evaluating the probability of literals and length/distance pairs is very time-consuming in software, but using fixed Huffman coding doesn't yield optimal compression results for most data.
Hardware logic can implement LZ77 matching by using hashes efficiently with much higher performance. Building a well-suited Huffman tree is more difficult cmpared to heuristics possible in software implementations. The data flow in high-bandwidth, low-latency hardware implementations can't stop long enough to explore the heuristics option space. However, an intelligent algorithm can find a limited set of Huffman trees for known classes of data to use with hardware compression.  and  publish such algorithms.
For dynamic Huffman encoding the RFC 1951 standard requires to embed the Huffman tree in the output data. The Huffman tree itself is compressed with a kind of run-length encoding plus fixed Huffman coding. Trees that encode subsequent literals or length/distance codes with the same number of bits can therefore be compressed very well to use little space in the compressed output data.
Details of the invention
The hardware implementation must have low latency, high throughput and small memory footprint. It is therefore not possible to gather statistics on incoming data for large blocks, and then, after deciding about the best tree, compressing the data with that tree.
This invention details the method used to encode data with one of the best suitable predefined trees under the above mentioned design targets.
Compared to prior Art, the method detailed doe...