Browse Prior Art Database

d-dimensional Arithmetic Coding

IP.com Disclosure Number: IPCOM000237514D
Publication Date: 2014-Jun-19
Document File: 7 page(s) / 301K

Publishing Venue

The IP.com Prior Art Database

Abstract

Arithmetic Coding is arguably the most important concept in modern lossless data compression. Its most basic version (this is with a non-adaptive model and supposing symbols taken i.i.d) performs close to Shannon's limit, is fast (linear in the size of the document to be compressed), requires only constant memory and can be implemented in few lines of code. But maybe more important, its appearance opened a whole new approach to data compression, decoupling the modeling of the documents from the actual encoding. The encoder (which most of the time is meant when talking about arithmetic coder) just needs as input a probability function over the symbols to be encoded (bits, bytes or characters). The responsibility of providing an accurate description of the data is referred to the modelling part. Thanks to this clear separation any changes to the model does not have to concern itself with the actual encoding, which requires some- times a good knowledge of low-level programming instructions. This allowed improvements in compression, as measured over standard benchmarks1, as well as permitting to relate theoretical advances in the modeling of natural language (for example) with data compression [2]. Arithmetic coding has been popular- ized thanks to the widely cited article of Witten, et al. [4] which includes a good implementation, and the presentation there is still one of the clearest available.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 24% of the total text.

Page 01 of 7

d-dimensional Arithmetic Coding

Matthias Gallé

November 4, 2013

Arithmetic Coding is arguably the most important concept in modern lossless data compression. Its most basic version (this is with a non-adaptive model and supposing symbols taken i.i.d) performs close to Shannon's limit, is fast (linear in the size of the document to be compressed), requires only constant memory and can be implemented in few lines of code. But maybe more important, its appearance opened a whole new approach to data compression, decoupling the modeling of the documents from the actual encoding. The encoder (which most of the time is meant when talking about arithmetic coder) just needs as input a probability function over the symbols to be encoded (bits, bytes or characters). The responsibility of providing an accurate description of the data is referred to the modelling part. Thanks to this clear separation any changes to the model does not have to concern itself with the actual encoding, which requires some- times a good knowledge of low-level programming instructions. This allowed improvements in compression, as measured over standard benchmarks1, as well as permitting to relate theoretical advances in the modeling of natural language (for example) with data compression [2]. Arithmetic coding has been popular- ized thanks to the widely cited article of Witten, et al. [4] which includes a good implementation, and the presentation there is still one of the clearest available.

  However, the wide-spread adoption of arithmetic coding in industrial stan- dards has been hampered by two important factors. One is that even modern implementations are slightly slower than state-of-art versions of string-based compressors like those from the LZ-family. However, the biggest reason is most probably the intellectual property scenario. The Handbook of Image and Video Processing says for instance [1] (highlight by the author):

  Although the JPEG lossless algorithm that uses Huffman coding has seen some adoption and a few public domain implementations of it are freely available, the JPEG lossless algorithm based on arith- metic coding has seen little use as of today despite the fact that it provides about 10% to 15% better compression. Perhaps this is due to the intellectual property issues surrounding arith- metic coding

1In particular the PAQ family of compressors which has won several challenges and is generally considered the state-of-the-art in general-purpose compression, see http://cs.fit. edu/~mmahoney/compression/paq.html

1

1 Introduction



Page 02 of 7


(a) The traditional arithmetic coding, separating modeling from encoding.


(b) Our proposal: the model is kept as it is, and the d-dimensional AC outputs d binary probability distribution as input of simple binary arithmetic coders.

Figure 1: Simplified schema of our proposal.

  The data compression landscape in general, and arithmetic coding in par- ticular, is heavily guarded by patents. The FAQ of comp.compress...