Browse Prior Art Database

Adaptive Context Generation For Data Compression

IP.com Disclosure Number: IPCOM000050372D
Original Publication Date: 1982-Oct-01
Included in the Prior Art Database: 2005-Feb-10
Document File: 3 page(s) / 29K

Publishing Venue

IBM

Related People

Rissanen, J: AUTHOR

Abstract

This invention relates to a method for adaptive context generation for data compression and the like, comprising the steps of (1) suitably ordering past symbol strings to form a selective structure and (2) growing contexts within the selected structure in a nested manner responsive to successive symbols, thereby minimizing memory to actually occurring contexts. The main problem in designing a data compression system is to select for each symbol in the string to be compressed its context, which means a certain collection of its adjacent past symbols. This process together with gathering the occurrence counts of the symbol at its context is known as modeling an information source for the string.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 49% of the total text.

Page 1 of 3

Adaptive Context Generation For Data Compression

This invention relates to a method for adaptive context generation for data compression and the like, comprising the steps of (1) suitably ordering past symbol strings to form a selective structure and (2) growing contexts within the selected structure in a nested manner responsive to successive symbols, thereby minimizing memory to actually occurring contexts. The main problem in designing a data compression system is to select for each symbol in the string to be compressed its context, which means a certain collection of its adjacent past symbols. This process together with gathering the occurrence counts of the symbol at its context is known as modeling an information source for the string. Ideally, the context ought to be such that it determines the symbol in question, so that when the context is observed the symbol can be calculated from it and no code for it is needed. This ideal, of course, is not normally achievable, and the next best thing is to have a context which skews the distribution for the symbol so that it can be encoded with a short mean code length.

The most familiar context is the preceding symbol, which gives a first order Markov model of the information source. Another more powerful context is obtained from the four past symbols x(1)...x(4), that are geometrically nearest to the current symbol y in a two-dimensional array of the string, as illustrated below (the dots indicate continuation of the string) x(3) x(2) x(4)

x(1) y

FIG. 1. Example of a Context.

In general, the bigger the context the better compression one gets, but also the more expensive it is to implement the system. The problem in modeling is how to find a preselected number m of contexts such that a good compression results. In general, one cannot afford to collect all the possible sequences as contexts, but instead one ought to gather only the "relevant" contexts. Because one cannot tell beforehand how long these relevant sequences are, they have to be allowed to vary in size. Further, it is desired to determine the contexts adaptively in one pass.

The first step is to order the past symbols, i.e., the symbols that occur before the current symbol, such that we pick that symbol first which we judge to have the greatest influence on the current symbol, the next most influential symbol as the second one, and so on. In Fig. 1, for example, the four past symbols are arranged in an order reflecting their geometrical nearness to the current symbol
y. This order could be continued by following the same principle.

The next step is to grow two binary trees, one for the case where the current symbol is 0 and the other when the current symbol is 1. Each tree is grown by applying a prior-art method to the past string, reading the symbols in the chosen order. The focus is upon the intersection of the two trees, which are generated directly. The steps include (1) starting with a 2-leaf tree T(1), where the root...