Browse Prior Art Database

Construction of Acoustic Labels for Automatic Speech Recognition From Large Numbers of Principal Discriminating Eigenvectors

IP.com Disclosure Number: IPCOM000111219D
Original Publication Date: 1994-Feb-01
Included in the Prior Art Database: 2005-Mar-26
Document File: 2 page(s) / 50K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+4]

Abstract

Acoustic labels for speech recognition systems are sometimes created via the algorithms of [1,2]. In these methods, a spliced acoustic parameter vector is formed which typically contains about 189 elements. This number of elements cannot generally be handled effectively, and there fore the spliced vectors are usually projected into a space of smaller dimension using the leading principal discriminating eigenvectors [3]. Normally, training data is seriously limited, and so in practice as few as 30 eigenvectors may be used. Although the smaller vectors can then be modelled accurately with limited training data, the reduction from 189 to 30 dimensions is so great that a great deal of useful acoustic information is irretrievably lost.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 53% of the total text.

Construction of Acoustic Labels for Automatic Speech Recognition
From Large Numbers of Principal Discriminating Eigenvectors

      Acoustic labels for speech recognition systems are sometimes
created via the algorithms of [1,2].  In these methods, a spliced
acoustic parameter vector is formed which typically contains about
189 elements.  This number of elements cannot generally be handled
effectively, and there fore the spliced vectors are usually projected
into a space of smaller dimension using the leading principal
discriminating eigenvectors [3].  Normally, training data is
seriously limited, and so in practice as few as 30 eigenvectors may
be used.  Although the smaller vectors can then be modelled
accurately with limited training data, the reduction from 189 to 30
dimensions is so great that a great deal of useful acoustic
information is irretrievably lost.  The procedure below recovers this
lost information without recourse to additional training data.

      Taking the principal discriminating eigenvectors in blocks of
N, starting with N leading eigenvectors, create a label stream for
each block using the methods of [1,2]  Reasonable values for N are in
the range 15-30.  Note that the vectors from each block of
eigenvectors are clustered and labelled completely independently, so
that idiosyncrasies in one set of prototypes will not generally be
reflected in any of the others.

Decode the resulting multiple label streams in parallel as described
in [4]....