Browse Prior Art Database

Method for Adaptively Initializing a Source Model for Symbol Encoding

IP.com Disclosure Number: IPCOM000042424D
Original Publication Date: 1984-May-01
Included in the Prior Art Database: 2005-Feb-03
Document File: 2 page(s) / 14K

Publishing Venue

IBM

Related People

Arps, RB: AUTHOR [+3]

Abstract

This invention relates to an adaptive string encoding method in which a priori statistics are used to start up and possibly modify the downstream encoding estimate. The method uses the binary events of ascertaining the ratio of new symbol occurrence to the total number of symbols in a startup string. With respect to the next encoding event, if the symbol is "new", a flat probability of occurrence is used, whereas if the symbol is "old", then it is encoded with a distribution based on counts. The method steps include (a) partitioning the symbols into two classes, "the observed" and "unobserved"; (b) assigning conditional probabilities to the symbols within each class; and (c) using an age-dependent scheme to assign probabilities in a code space to the observed characters.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Method for Adaptively Initializing a Source Model for Symbol Encoding

This invention relates to an adaptive string encoding method in which a priori statistics are used to start up and possibly modify the downstream encoding estimate. The method uses the binary events of ascertaining the ratio of new symbol occurrence to the total number of symbols in a startup string. With respect to the next encoding event, if the symbol is "new", a flat probability of occurrence is used, whereas if the symbol is "old", then it is encoded with a distribution based on counts. The method steps include (a) partitioning the symbols into two classes, "the observed" and "unobserved"; (b) assigning conditional probabilities to the symbols within each class; and (c) using an age- dependent scheme to assign probabilities in a code space to the observed characters. It is recognized that the unobserved class of symbols may have a nonstationary statistical nature. When in a startup phase, that is few events having occurred in the context, it is anticipated that unobserved symbols may have great frequency. As the number of events increases, the frequency of occurrence of the unobserved symbols decreases rapidly. The method constitutes a rapidly adapting estimation scheme which assigns greater weight to more recent events. Let the alphabet consist of m symbols, 1,...,m. At any point in time, let variable k denote the number of symbols which have been seen. If all symbols have been seen, then k=m; otherwise, k<m. Without loss of generality, assign these k symbols the first k indices, 1,...,k. Adaptive code space assignment is often done based on counts. Let ni denote the count for symbol i in a string (or string prefix) of length n. Thus, the sum of ni's totals n. Call the set of seen symbols "s" and the set of unseen symbols "u". Algorithms A and B Prior- art algorithms A and B for assigning code space to unobserved symbols provide a basis for appreciating the invention. Algorithms A and B are characterized as "age-independent" because the influence of a symbol count is independent of how far in the past it occurred. The basic idea for assigning code space is to use count ratios. Algorithm A Initially, the code space assigned to each symbol is 1/m, since all of the counts ni are zero. For each symbol which has been seen: p(i) = ni/(n+1), 1<_i_<k. The unconditional code space for each unseen symbol is: p(j) = 1/((n+1)x(m-k)), k<j<_m. Algorithm B Algorithm B assigns code space by assigning a count of 1 to each symbol at initialization. Thus at initialization, each symbol has a code space of 1/m. When the first symbol 1 is seen, its code space p(1) becomes 2/(m+1). After "n" symbols have been se...