Browse Prior Art Database

Determination of Gaussian Seed Clusters in the Construction of a Diagonal Gaussian Vector Quantiser for Speech Recognition

IP.com Disclosure Number: IPCOM000106847D
Original Publication Date: 1993-Dec-01
Included in the Prior Art Database: 2005-Mar-21
Document File: 2 page(s) / 76K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+4]

Abstract

In [1] a procedure is given for constructing a vector quantiser for speech recognition purposes. The vector quantiser is based on a mixture of diagonal Gaussian distributions which is created in two stages. First, seed distributions are derived using an expensive Euclidean clustering process. And second, the seed distributions are defined via K-means diagonal Gaussian clustering.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Determination of Gaussian Seed Clusters in the Construction of a Diagonal Gaussian Vector Quantiser for Speech Recognition

      In [1]  a procedure is given for constructing a vector
quantiser for speech recognition purposes.  The vector quantiser is
based on a mixture of diagonal Gaussian distributions which is
created in two stages.  First, seed distributions are derived using
an expensive Euclidean clustering process.  And second, the seed
distributions are defined via K-means diagonal Gaussian clustering.

      In [2]  the expensive first stage is replaced by a faster
supervised algorithm which relies on the existence of a small
alphabet of sub-word models, such as leafforms.  These sub-word
models are usually obtained iteratively by algorithm, and therefore
may not exist (before the iteration begins), or may be considered too
crude to rely on (during the early iterations).  In these cases the
methods of [2]  may be unworkable or inappropriate.

      The purpose of the invention below is to provide a fast
algorithm for obtaining Gaussian seed distributions, when supervision
as in [2]  is inappropriate.

      Assume the existence of some acoustic parameter vectors tagged
with the identity of the label intended to be output by the vector
quantiser.  The following steps are performed for each label in the
label alphabet.

1.  Let N denote the number of seed clusters required.  A reasonable
    value for N is 20.  Let S > N be the number of initial clusters
    formed in the following step.  A reasonable value for S is 5N.
    Select S seed vectors at random from the training data.

2.  Perform K-means Euclidean clustering on a subset of the training
    data starting from the S seeds selected in Step (1).  The subset
    may include all the training data or may be limited to no more
    than M vectors.  A reasonable value for M is 10,000.

3.  For each cluster obtained in Step (2), compute and store the sums
    and sums of squares of each element in the acoustic vector.
    Store also the cluster sizes.

4.  Using the data stored in Step (3), compute the square root of the
    determinant of the diagonal ...