Browse Prior Art Database

Dynamic Thresholding Scheme for Fast Match

IP.com Disclosure Number: IPCOM000099823D
Original Publication Date: 1990-Feb-01
Included in the Prior Art Database: 2005-Mar-15
Document File: 2 page(s) / 73K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+2]

Abstract

The Tangora speech recognition system currently uses an initial matching step, referred to as the fast match (*). For any utterance, the fast match finds a small number of words, typically in the range of 30 to 100, that are likely to contain the word uttered. The fast match is organized as a search of a dictionary tree made up from phonetic spelling of all words in the vocabulary. Extensive pruning of the branches during search is required in order to execute the search within a short time. In this article we propose an improved scheme for pruning the search paths. Whereas the earlier schemes used a fixed threshold to achieve pruning, here we devise a scheme for computing the thresholds dynamically, in the spirit of branch-and-bound techniques.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Dynamic Thresholding Scheme for Fast Match

       The Tangora speech recognition system currently uses an
initial matching step, referred to as the fast match (*). For any
utterance, the fast match finds a small number of words, typically in
the range of 30 to 100, that are likely to contain the word uttered.
The fast match is organized as a search of a dictionary tree made up
from phonetic spelling of all words in the vocabulary.  Extensive
pruning of the branches during search is required in order to execute
the search within a short time.  In this article we propose an
improved scheme for pruning the search paths.  Whereas the earlier
schemes used a fixed threshold to achieve pruning, here we devise a
scheme for computing the thresholds dynamically, in the spirit of
branch-and-bound techniques.

      Let n denote non-leaf nodes of the dictionary tree, and let l
denote the leaf nodes.  Each leaf node is associated with a word from
the vocabulary (a particular pronunciation of a word, to be exact)
and each edge in the path from the root to the leaf corresponds to a
phone in the phonetic transcription of the word.  During the search,
visiting a node in the tree involves computing the acoustic match
value for some segment of the acoustic label stream given the phone
corresponding to the arc leading to that node from its parent.  This
match returns a value m(n) and a frame number b(n) at which the
labels corresponding to this phone are hypothesized to be ending.
For a leaf node l the value m(l) is the score for the word associated
with that leaf and b(l) is the hypothesized end-point of that word.

      The proposed pruning scheme is the following.  Let n1,n2,...,np
be the path from the root node to a leaf l in the tree (np=l).
Let us assume that the match value m(l) obtained for this leaf is the
best match value at any lea...