Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Estimation of Next Word Probabilities from Insufficient Text

IP.com Disclosure Number: IPCOM000050768D
Original Publication Date: 1982-Dec-01
Included in the Prior Art Database: 2005-Feb-10
Document File: 3 page(s) / 34K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+4]

Abstract

This publication describes a speech recognition technique which, given a sequence of words that has already been recognized, allows computation and estimation of the probability distribution of the next-word from insufficient text by storing frequency of occurrence counts of groups of words and their corresponding parts of speech. The probability distribution is computed as the weighted linear combination using several different estimators. Also, the method for estimating the next-word probabilities can include part of the speech class of the previous word.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 3

Estimation of Next Word Probabilities from Insufficient Text

This publication describes a speech recognition technique which, given a sequence of words that has already been recognized, allows computation and estimation of the probability distribution of the next-word from insufficient text by storing frequency of occurrence counts of groups of words and their corresponding parts of speech. The probability distribution is computed as the weighted linear combination using several different estimators. Also, the method for estimating the next-word probabilities can include part of the speech class of the previous word.

The goal of continuous speech recognition is to find the word sequence w approx. which for an observed signal a approx. maximizes the likelihood L(a approx.' w approx.)=P(w approx.) P(a approx./w approx.) (1) where P(w approx.) is the a priori probability of word sequence w approx., and P(a approx. /w approx.) is the conditional probability of the observed signal a approx., given that the word sequence w approx. was spoken. P( a approx./w approx.) is computed with the help of a model of performance of the acoustic channel consisting of the interaction of the speaker with the acoustic processor. This publication concerns the extraction of

P(w approx.)=P(w(1)) Pi/i=2 P(w(i)/ w/1/i-1 approx. i-1) (2) where we assume that the string w approx. is of length n, and we use the string notation w/i/j approx.=w(i), w(i+1)... w(j) j greater than or equal to i (3).

The problem is to estimate the probabilities P(w(i)/w/1/i-1 approx.) from some finite text believed to contain sentences typical of those to be spoken. It turns out that even for moderate values of i (e.g., i=3) it is not possible to approximate the desired quantity by the relative frequency count f(w(1)/w/1/i-1 approx.) /Delta/=N(w/1/i-1 approx.)/N(w(i),w/1/i-1 approx.) where N(x) stands for the number of occurrences of x in the text. f is an inadequate predictor because its value is, e.g., zero for many words w(i) that can follow a word string w/1/i-1 approx. but have not been observed to do so in the text.

We propose using P(w(i)/w/1/i-1 approx.)=E/j=1/k Lambda(j) (w/1/i-1 approx.) f(w(1) /P...