Browse Prior Art Database

Enhanced Polling: An Improvement to a Technique for Obtaining a Short List of Candidate Words in Speech Recognition

IP.com Disclosure Number: IPCOM000119272D
Original Publication Date: 1991-Jan-01
Included in the Prior Art Database: 2005-Apr-01
Document File: 5 page(s) / 191K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+4]

Abstract

In the Poisson-based polling fast match (1) word scores are computed incrementally over the duration of the utterance to be identified. Implicit in this method is the assumption that the end-points of the unknown utterance are identifiable. In isolated speech, end-point detection is not usually a problem, but in continuous speech, where pauses do not generally occur between words, there is no known way to determine the end of an utterance prior to recognition. For this reason, a more sophisticated polling algorithm is required for continuous speech.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 37% of the total text.

Enhanced Polling: An Improvement to a Technique for Obtaining a Short
List of Candidate Words in Speech Recognition

      In the Poisson-based polling fast match (1) word scores
are computed incrementally over the duration of the utterance to be
identified.  Implicit in this method is the assumption that the
end-points of the unknown utterance are identifiable.  In isolated
speech, end-point detection is not usually a problem, but in
continuous speech, where pauses do not generally occur between words,
there is no known way to determine the end of an utterance prior to
recognition.  For this reason, a more sophisticated polling algorithm
is required for continuous speech.

      This invention consists of a method for computing word scores
incrementally such that the score of the correct word rises
throughout the utterance and falls thereafter, peaking in the
vicinity of the end of the utterance.  The peak, therefore,
identifies the end of the utterance, and indicates at what point
polling should terminate.  Like the earlier polling fast match (1),
the present enhanced version is based on a Poisson model.  It differs
from the earlier version in that two additions per word are required
at each time frame, instead of one.

      Let Fi denote the frequency of label fi in an utterance of word
W, and assume that Fi has a Poisson distribution with mean mi . Then

                            (Image Omitted)

 (1)

      Let H = {hi} denote the histogram of observed label frequencies
in an unidentified utterance. Assuming that the hi are independent we
have (2)
which can be rewritten as (3)
where yt denotes the label occurring at time t, Ft(yt) denotes the
total frequency of yt up to time t inclusive, n denotes the actual
length of the unidentified utterance in frames, and Lw denotes the
expected length of an utterance of W.

      Let HL denote the observed histogram at time t = L in the
unidentified utterance, and let E(L  W) denote the expected value of
Pr(HL  W) when  W is the correct word.  An efficient means of
computing E(L  W) is given in (2).  Let the scoring function S1(L,W)
be defined as (4)

      When W is incorrect, S1 tends to become increasingly negative
as L increases.  When W is correct, S1 tends to hover around zero
until L exceeds the actual length of the utterance, and then it tends
to become increasingly negative.  Thus, S1 has the necessary property
that the correct word will tend to have the highest score, but it
lacks the property of peaking at the end of the utterance. We can
rectify matters by defining an amended scoring function S2(L,W): (5)
where Pr(L   W) denotes the probability that an utterance of W will
endure for precisely L frames, and it can be computed under the
assumption that the duration of W has a Poisson distribution with
mean LW: (6)

      The additional term in S2 ensures that when W is correct, S
rises to a peak in the vicinity of the...