Browse Prior Art Database

Probability Distribution Estimation From Sparse Data

IP.com Disclosure Number: IPCOM000065267D
Original Publication Date: 1985-Nov-01
Included in the Prior Art Database: 2005-Feb-19

Publishing Venue

IBM

Related People

Authors:
Jelinek, F Mercer, RL [+details]

Abstract

The present invention relates generally to estimating the probability of appearance of species in a population, a problem associated with fields such as insurance, genetics, and pattern or speech recognition. In particular, where parameters (p define the estimated Gk) probability of a selected specie being the kth specie in a sequence of species, several approaches for determining the parameters -- based on a cross- validation principle -- are set forth. In the discussion below, the specie will be words in a text (the population). Let W = {1,2,...,L} be the vocabulary, and T = w1,w2 ...,wM (wieW) be a sample of text from some much larger corpus C(T C). We want to estimate the probability pj = P W=j that a randomly selected word from the corpus C will be the jth word of the vocabulary W.