Browse Prior Art Database

Method for Constructing a Sparse Projection Matrix Spanning a Large Time Window for use in a Speech Recognition System

IP.com Disclosure Number: IPCOM000105731D
Original Publication Date: 1993-Sep-01
Included in the Prior Art Database: 2005-Mar-20
Document File: 2 page(s) / 112K

Publishing Venue

IBM

Related People

de Souza, P: AUTHOR [+4]

Abstract

In one prominent approach to speech recognition[1], the following acoustic processing is performed. An acoustic parameter vector of about 21 elements is computed at regular intervals of about 10ms. A spliced parameter vector of about 189 elements is then associated with each time frame and is obtained by concatenating together about 9 of the 21-dimensional vectors from a window centered on the associated frame. These spliced vectors of about 189 elements are then projected down to about 50 dimensions using discriminating eigenvectors [2]. Thus the final parameter vector associated with any given frame t reflects both the instantaneous character of the signal at time t and the dynamic properties over a window of about 90ms centered at time t.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 47% of the total text.

Method for Constructing a Sparse Projection Matrix Spanning a Large Time Window for use in a Speech Recognition System

      In one prominent approach to speech recognition[1], the
following acoustic processing is performed.  An acoustic parameter
vector of about 21 elements is computed at regular intervals of about
10ms.  A spliced parameter vector of about 189 elements is then
associated with each time frame and is obtained by concatenating
together about 9 of the 21-dimensional vectors from a window centered
on the associated frame.  These spliced vectors of about 189 elements
are then projected down to about 50 dimensions using discriminating
eigenvectors [2].  Thus the final parameter vector associated with
any given frame t reflects both the instantaneous character of the
signal at time t and the dynamic properties over a window of about
90ms centered at time t.

     For some acoustic events, it appears to be advantageous to
increase the window length to a value greater than about 90ms [3].
Yet the above technology cannot easily be extended to windows of this
length: the covariance matrices involved in the calculation of the
discriminating eigenvectors become so large that (1) there is
generally insufficient data available to estimate them reliably, and
(2) they are unmanageable from an algorithmic point of view as the
matrix manipulations and operations become more prone to serious
numerical errors or instabilities.

     The invention described here specifies an iterative algorithm
for choosing a way of concatenating together about 9 parameter
vectors, not necessarily adjacent, so that the effective window
length is increased without increasing the sizes of any of the
covariance matrices or parameter reduction matrices used in the
calculations.  The algorithm employs a measure of the relative
importance of the parameter vectors used in the concatenation.

     It will be assumed that some training data has been recorded and
signal processed, and that a P-dimensional acoustic parameter vector
has been associated with each time frame.  Typically, P would be
about 21, and the time frame would be about 10ms.

The following steps are performed:

(1)  Choose a set of N indices, with small, distinct, integer values.
A reasonable value for N is 9; a reasonable set of indices is the set
-20, -15, ..., + 15, +20.  At each time frame T, concatenate together
the N P-dimensional vectors that are offset from frame T by the
respective indices.  Let Q = N.P denote the dimension of the
resulting spliced vectors.

(2) Using the Q-dimensional vectors just created, compute
Q-dimensional discriminating eigenvectors, as described in [1,2].

(3)  Restricting attention to the eigenvectors associated with the
top M eigenvalues, compute, for each of the N input frames, the
average magnitude of the corresponding M.P coefficients in these
eigenvectors.  A typical value for M is 50.  The eigenvectors should
be normalized, for example, to un...