Browse Prior Art Database

Speaker-Independent Band Quantization for Rapid Estimation of Acoustic Parameters

IP.com Disclosure Number: IPCOM000114058D
Original Publication Date: 1994-Nov-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 2 page(s) / 77K

Publishing Venue

IBM

Related People

Lucassen, JM: AUTHOR [+2]

Abstract

In a common approach to speech recognition, output from an acoustic processor is matched against a set of diagonal Gaussian prototypes that have been expressed in terms of a small number of one-dimensional Gaussians, called 'atoms' for each band. Previously, the correspondence between the prototypes and the one-dimensional atoms was established separately for each speaker.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Speaker-Independent Band Quantization for Rapid Estimation of Acoustic
Parameters

      In a common approach to speech recognition, output from an
acoustic processor is matched against a set of diagonal Gaussian
prototypes that have been expressed in terms of a small number of
one-dimensional Gaussians, called 'atoms' for each band.  Previously,
the correspondence between the prototypes and the one-dimensional
atoms was established separately for each speaker.

      The correspondence is established between the prototypes and
these atoms in a way that is speaker-independent.  This allows us to
estimate the prototypes for a new speaker by estimating only the mean
and variance of each atom.

      Since each atom is shared by a large number of prototypes, this
invention greatly reduces the amount of data that is required to
estimate the parameters with the required accuracy.

      Implementation - Assume a speech recognition process in which
output from an acoustic processor is matched against a set of
diagonal Gaussian prototypes that have been expressed in terms of a
small number of one-dimensional Gaussians, called 'atoms', for each
band (1).

      Further assume that there is a fixed number of prototypes, the
same for each talker; that the definition of each prototype is the
same for each talker; and that the prototypes are obtained in such a
way that given some new, aligned speech, it is straightforward to
determine to which prototype each frame of speech should ideally
correspond (2).

      In a previous approach, the correspondence between the
prototypes and the one-dimensional 'atoms' was established separately
for each speaker.

This correspondence is established in way that is
speaker-independent, as follows:
  1.  Compute the prototypes for several speakers
  2.  Perform steps 3-7 for each band of the prototypes
  3.  Initialize a mapping from prototypes to atoms as the identity
       mapping
  4.  For each speaker, compute the log likelihood penalty associated
       with merging each pair of atoms
  5.  Find the pair of atoms that minimizes the maximum of this
penalty
       over the speakers.
...