Browse Prior Art Database

Entropy Estimator for Sparse Data

IP.com Disclosure Number: IPCOM000060829D
Original Publication Date: 1986-May-01
Included in the Prior Art Database: 2005-Mar-09
Document File: 3 page(s) / 38K

Publishing Venue

IBM

Related People

Mercer, RL: AUTHOR [+2]

Abstract

In a speech recognition environment in which the probability of a word is estimated from preceding text, a method is provided for estimating the entropy of a distribution of words in a vocabulary (i.e., difficulty of language to be recognized), especially where there is sparse data. The method involves generating a Bayes estimator, or amean squared error optimal estimate of entropy in a simple closed form, based on explicit smoothing assumptions. Suppose there is a sample of words (or strings of words) W1,...,WN where the words (or strings) are regarded as statistically independent. Assuming a vocabulary of size k, let X = X1,...,Xk denote the respective number of occurrences of each vocabulary word (or bigram if we are interested in bigram entropy, etc.) in this sample. S Xi = N.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 57% of the total text.

Page 1 of 3

Entropy Estimator for Sparse Data

In a speech recognition environment in which the probability of a word is estimated from preceding text, a method is provided for estimating the entropy of a distribution of words in a vocabulary
(i.e., difficulty of language to be recognized), especially where there is sparse data. The method involves generating a Bayes estimator, or amean squared error optimal estimate of entropy in a simple closed form, based on explicit smoothing assumptions. Suppose there is a sample of words (or strings of words) W1,...,WN where the words (or strings) are regarded as statistically independent.

Assuming a vocabulary of size k, let X = X1,...,Xk denote the respective number of occurrences of each vocabulary word (or bigram if we are interested in bigram entropy, etc.) in this sample. S Xi =
N. X has the multinomial distribution with probability element

(Image Omitted)

where p = p1,...,pk is a vector of probabilities satisfying

(Image Omitted)

Based on data X = x, we wish to estimate the entropy of the distribution p.

In this regard, the entropy is identified as an unknown parameter (1)

(Image Omitted)

The present Bayes estimator requires that prior information, excluding current training data, about unknown probabilities of words be specified as a probability distribution for the unknown probabilities. A convenient way of doing this is by requiring only that in analogy with real counts X = X1,...,Xk the prior information be summarized as the virtual counts a a a = 1,..., k . Let p have a prior...