Browse Prior Art Database

Good Way to Provide Speed-Accuracy Trade-Offs in the Tangora

IP.com Disclosure Number: IPCOM000112708D
Original Publication Date: 1994-Jun-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 2 page(s) / 79K

Publishing Venue

IBM

Related People

Daggett, G: AUTHOR [+5]

Abstract

The Tangora Automatic Speech Recognizer uses a search procedure [1] to find the most like word string given the speech for the utterance. Like any search procedure, it is nice to provide the user with speed vs. accuracy parameters. Finding a good parameter is not always easy. This article discloses that smoothing the output statistics of the Hidden Markov Model parameters is a good mechanism for controlling speed vs. accuracy trade-offs.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Good Way to Provide Speed-Accuracy Trade-Offs in the Tangora

      The Tangora Automatic Speech Recognizer uses a search procedure
[1] to find the most like word string given the speech for the
utterance.  Like any search procedure, it is nice to provide the user
with speed vs.  accuracy parameters.  Finding a good parameter is not
always easy.  This article discloses that smoothing the output
statistics of the Hidden Markov Model parameters is a good mechanism
for controlling speed vs.  accuracy trade-offs.

      The Tangora Automatic Speech Recognizer uses a search procedure
(1)  to find the most like word string given the speech for the
utterance.  Like any search procedure, it is nice to provide the user
with speed vs.  accuracy parameters.  Finding a good parameter is not
always easy.  One would like the accuracy to always be near its best;
a large gain in speed should be attained for a small hit on accuracy.
Fine tuning direct parameters, like the size of a stack on the
search, or what minimum score words need, are likely to be difficult
to tune together.

      The arc output statistics, that is, the probabilities of seeing
a particular acoustic label given the arc in the Hidden Markov Model,
are computed at training time for each speaker (1,2).  These
statistics are computed for models used in both the Tangora fast and
detailed matches.  With the current z-label algorithm [3], the labels
are supervised clustered, so that only data seen for a particular
feneme is used to construct the prototype for that feneme.
Consequently, the arc statistics have very sharp distributions.  This
actually hurts accuracy, as the data too closely models the training
data.  In the original z-label training, the statistics are smoothed
by square rooting each value, and then renormalizing along all the
labels for an arc.  This has the effect of reducing the dynamic range
of the probabilities for an arc, essentially smoothing them.

      Changing the square-rooting to raising each probability to a
power between 0.5 and 1.0 affects both speed and accuracy.  For 7
speakers, 1 had...