Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Diagonal-Gaussian, Quantized Mixture Labeler

IP.com Disclosure Number: IPCOM000111898D
Original Publication Date: 1994-Apr-01
Included in the Prior Art Database: 2005-Mar-26
Document File: 4 page(s) / 102K

Publishing Venue

IBM

Related People

Bellegarda, JR: AUTHOR [+4]

Abstract

The American English version of the Tangora Automatic Speech Recognizer utilizes a sophisticated labeling scheme called Zuelogical Labeling [1]. The older versions of the American English Tangora, as well as the current European versions of the Tangora, utilize a K-means labeling algorithm. This paper presents an algorithm that reaps many of the benefits of Zuelogical Labling, but does not require the lengthy data collection, tree growing, and boot-strapping process. In addition, the algorithm contained herein can be used to facility bootstrapping the the Zuelogical label system.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Diagonal-Gaussian, Quantized Mixture Labeler

      The American English version of the Tangora Automatic Speech
Recognizer utilizes a sophisticated labeling scheme called Zuelogical
Labeling [1].  The older versions of the American English Tangora, as
well as the current European versions of the Tangora, utilize a
K-means labeling algorithm.  This paper presents an algorithm that
reaps many of the benefits of Zuelogical Labling, but does not
require the lengthy data collection, tree growing, and boot-strapping
process.  In addition, the algorithm contained herein can be used to
facility bootstrapping the the Zuelogical label system.

      The K-means labeling algorithm [2]  is a very simple,
Euclidean-metric labeling algorithm.  THe Z-labeling algorithm [1]
offers many advantages.  But there are two major disadvantages of the
z-labeling algorithm.  First, lots of data must be collected because
the baseforms and decision trees need to be built.  Second, the
process of building the baseforms and decision trees is time
consuming.  Yet there are many advantages to Z-labeling that are
independent from the problems mentioned above.  For example, one can
utilize the following techniques from the Z-labeling algorithm
without growing new decision trees or building new baseforms:

o   Splicing and rotating vectors, and utilizing 50 dimensional
    vectors instead of 20 dimensional ones.

o   Supervising the clustering, by binning vectors that align to the
    same phones.

o   Using a diagonal Gaussian metric instead of a Euclidean one.

o   Using a mixture of prototypes for a particular label.

      Doing this, but not growing decision trees or building new
baseforms, will gain much of the improvement of the Z-label
algorithm, without requiring the effort of building a complete
Z-label system.

      The algorithm creates supervised diagonal Gaussian, quantized
mixture prototypes, using both fenemic and phonetic Viterbi
alignments to perform the supervision.  The training procedure for
this invention consists of many steps:

1.  Signal process the recordings into 20-dimensional vectors.

2.  Cluster the vectors into 200 prototypes using the K-means
    algorithm.

3.  Label the training data.

4.  Perform fenemic training.

5.  Viterbi align each recording to its lexemes, in order to get how
    each word with multiple lexemes was pronounced.

6.  Perform lexemic training.

7.  Viterbi align each label/vector to its fenemic phone.

8.  Determine the eigenvectors of the class centers attained by
    binning each 20-dimensional vector to the fenemic phone to which
    it viterbi aligned.

9.  Do phonetic training.

10. Viterbi align each label/vector to its phonetic phone
    distribution.

11. Splice and rotate each vector up to 50-dimensions.

12. Supervise cluster the vectors so that each vector goes into a
    file with oth...