Browse Prior Art Database

Removal of Outlying Label Strings From a Database Prepared For the Construction of Phonological Rules

IP.com Disclosure Number: IPCOM000099814D
Original Publication Date: 1990-Feb-01
Included in the Prior Art Database: 2005-Mar-15
Document File: 3 page(s) / 117K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+4]

Abstract

Automatic construction of phonological rules for continuous speech recognition requires a database of acoustic-label sequences. This database is usually created by Viterbi alignment (*) and contains errors where misalignments occur. These errors, called outliers, can have serious effects, especially when excessively long label sequences result, and therefore the outlying strings should be removed from the database wherever possible. The invention below performs this function.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Removal of Outlying Label Strings From a Database Prepared For the Construction of Phonological Rules

       Automatic construction of phonological rules for
continuous speech recognition requires a database of acoustic-label
sequences.  This database is usually created by Viterbi alignment (*)
and contains errors where misalignments occur. These errors, called
outliers, can have serious effects, especially when excessively long
label sequences result, and therefore the outlying strings should be
removed from the database wherever possible.  The invention below
performs this function.

      A database of label sequences is required in which each
sequence is associated with the Markov model assumed to have
generated the sequence.  The sequences and models may represent a
single phone, a complete word, or any other unit of speech.  The
procedure is as follows.
Step  1. Perform Steps 2-8 for each label sequence and its associated
Markov model.
Step  2. Perform the conventional forward trellis calculation [*] for
determining the Viterbi alignment between the label sequence and the
model.
Step  3. Let V1 denote the log probability of the Viterbi path com
     puted in Step 2, let S1 denote the state on the final trellis
slice which has maximum probability, and let L1 denote its log
probability.  Define P1 = V1 - L1.  A large difference between V1 and
L1 is an indication that the end of the ut         terance is missing
from the label sequence.
Step  4. Perform the Viterbi trellis calculation backwards, i.e.,
starting with the final label and final state of the model, and
working backwards to the beginning.
Step  5. Let V2 denote the log probability of the Viterbi path com
     puted in Step 4, let S2 denote the state on the leftmost trellis
slice which has maximum probability, and let L2 de         note its
log probability.  Define P2 = V2 - L2.  A large difference between V2
and L2 is an indication that the be         ginning Of the utterance
is missing from the label sequence.
Step  6. Define P3 = -SQRT(-min(V1, V2)).  V1 and V2 will usually be
identical; however, they may differ with outlying label sequences due
to the effects of thresholding.  The square root is taken in order to
improve the fit of P3 to a Gaussian distribution.  A low value of P3
is an indication that the feneme sequence is an outlier.
Step  7. Define P4 = SQRT(length of the label sequence). Again, the
square root is taken in order to improve the fit to a Gaus
sian distribution.  Unusually high or low values for P4 are
indications that the feneme sequence is an outlier.
Step  8. Define P5 = P3/P4; ignoring signs, this is the square root
of the average Viterbi log probability per label. As usual, the
square root is taken in order to improve the fit to a Gaussian
distribution.  A low va...