Recursive Self-Smoothing of Linguistic Contingency Tables
Original Publication Date: 1984-Dec-01
Included in the Prior Art Database: 2005-Feb-06
The number of zeros in a contingency table, which varies from many zeros for n > 1 to mostly zeros for n > 2, may be smoothed by using three factors as follows: 1. probabilities conditioned on more recent past are obtained as weighted averages of long term memory; 2. short term memory is regarded as prior information for estimating priorities based on long term memory; 3. identify the data required for estimating the parameters of the prior distribution, using probabilities conditioned as in 1. The estimator is constructed and normalized so as to sum to unity. In probablistic modeling of natural language text the number of occurrences of possible n-grams ln = (11 12 .....1n) of words can be arranged in a contingency table. For n > 1, such tables have many zeros; for n > 2' they are sparse.