Browse Prior Art Database

Improved Procedure for Outlier Detection in a Database of Label Sequences Derived from Spoken Utterances

IP.com Disclosure Number: IPCOM000109636D
Original Publication Date: 1992-Sep-01
Included in the Prior Art Database: 2005-Mar-24
Document File: 2 page(s) / 136K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+4]

Abstract

In one prominent approach to speech recognition, the pronunciation of a word is represented by a Markov source model which consists of one or more allophonic models. These allophonic models typically each model the pronunciation of one phoneme and vary as the phonetic context varies. The rules which determine which model of a phoneme is appropriate in a given context are called phonological rules. The phonological rules and the allophonic models are both determined automatically from a database of label sequences tagged with their phonetic context (1,2).

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Improved Procedure for Outlier Detection in a Database of Label Sequences Derived from Spoken Utterances

       In one prominent approach to speech recognition, the
pronunciation of a word is represented by a Markov source model which
consists of one or more allophonic models.  These allophonic models
typically each model the pronunciation of one phoneme and vary as the
phonetic context varies.  The rules which determine which model of a
phoneme is appropriate in a given context are called phonological
rules.  The phonological rules and the allophonic models are both
determined automatically from a database of label sequences tagged
with their phonetic context (1,2).

      The database of label sequences is usually created
automatically from a large body of training speech and may contain
errors.  For this reason algorithms have been devised for detecting
and removing outliers from the database, e.g., (3).  Although
helpful, these algorithms are imperfect.  This article details an
improved procedure for detecting outlying label sequences.

      It will be assumed that the existence of a database of label
sequences, each tagged with its phonetic context, but which may
contain errors.  The database should be drawn from a large body of
speech consisting of at least 20,000 sentences.  The following steps
are performed.
(1)  Using the database of label sequences, construct a phonological
tree as described in (2).  Since it is desirable that the sequences
associated with any given leaf are homogeneous, the tree should be
grown beyond the normal limits, until no leaf has more than about 80
sequences.  To achieve "clean" splits to this level, it is advisable
to use at least 10 phones of left context and 10 phones of right
context instead of the usual 5.
(2)  Perform steps 3-10 for each leaf L of the tree constructed in
step 1.
(3)  If leaf L has fewer than N sequences, discard all of them.  A
reasonable value for N is 5.  Otherwise, perform steps 4-10.
(4)  Perform step 5 for each sequence S at leaf L.
(5)  Separate sequence S from the rest of the sequences at leaf L.
Compute and store the value of this split using the same objective
function as was used to construct the tree in step 1.
(6)  Perform steps 7-8 for each sequence S at leaf L.
(7)  Compute the mean and standard deviation of all the split values
computed in step 5 excluding the value associated with sequence S.
(8)  Using a t test (4), and the mean and standard deviation computed
in step 7, test the significance of the difference between...