Browse Prior Art Database

Braided Labels - A Vector Quantisation Algorithm for Automatic Speech Recognition

IP.com Disclosure Number: IPCOM000111038D
Original Publication Date: 1994-Feb-01
Included in the Prior Art Database: 2005-Mar-26
Document File: 2 page(s) / 87K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+4]

Abstract

Acoustic labels for speech recognition systems are sometimes created via the algorithms of [1,2]. In these methods, a spliced acoustic parameter vector is formed which typically contains about 189 elements. This number of elements cannot generally be handled effectively, and therefore the spliced vectors are usually projected into a space of smaller dimension using the leading principal discriminating eigenvectors [3]. Normally, training data is seriously limited, and so in practice as few as 30 eigenvectors may be used. Although the smaller vectors can then be modelled accurately with limited training data, the reduction from 189 to 30 dimensions is so great that a substantial amount of useful acoustic information is irretrievably lost.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Braided Labels - A Vector Quantisation Algorithm for Automatic Speech
Recognition

      Acoustic labels for speech recognition systems are sometimes
created via the algorithms of [1,2].  In these methods, a spliced
acoustic parameter vector is formed which typically contains about
189 elements.  This number of elements cannot generally be handled
effectively, and therefore the spliced vectors are usually projected
into a space of smaller dimension using the leading principal
discriminating eigenvectors [3].  Normally, training data is
seriously limited, and so in practice as few as 30 eigenvectors may
be used.  Although the smaller vectors can then be modelled
accurately with limited training data, the reduction from 189 to 30
dimensions is so great that a substantial amount of useful acoustic
information is irretrievably lost.  One successful attempt to recover
this lost information is described in [4]  where multiple label
streams are created from small successive blocks of eigenvectors.

      The procedure below specifies a more efficient method of
information recovery than [4], and has the additional advantage that
only one stream of labels is generated.  Having only one stream
greatly simplifies the training and recognition process as well as
speeding them up.  Like [4], the present method does not require any
additional training data over what is typically available.

      Taking the principal discriminating eigenvectors in subsets of
N, starting with the N leading eigenvectors, create diagonal Gaussian
prototypes for each subset independently using the methods of [1,2].
Reasonable values for N are in the range 15-30.  That is, for each
subset of eigenvectors, the training-data spliced parameter vectors
are projected into N-dimensions, and clustered.  This leads to
multiple streams of N-dimensional parameter vectors, each with its
own set of prototypes.  Because the vectors from each subset of
eigenvectors are clustered independently, idiosyncrasies in one set
of prototypes will not generally be reflected in any of the others.

      Test data is processed in the same way:  the spliced parameter
vectors are projected into multiple streams of N-dimensional vectors
using the same subsets of eigenvectors as before.  The label
associated with any given time frame is then chosen to be the one
which maximizes the product of the likelihoods of that frame's
N-dimensional vectors...