Browse Prior Art Database

Speaker-Independent Acoustic Features for Continuous Speech Recognition

IP.com Disclosure Number: IPCOM000103838D
Original Publication Date: 1993-Feb-01
Included in the Prior Art Database: 2005-Mar-18
Document File: 2 page(s) / 58K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+4]

Abstract

When processing a new speaker in speech recognition, it is desirable to make use of information derived from previous speakers. This cuts down on the amount of training data and training time required before the new speaker can be recognized. To facilitate the creation of acoustic models for a new speaker, it is desirable that the new speaker and the previous (reference) speakers share common acoustic features. In many current systems, this commonality is ensured by defining acoustic features manually, but manually defined acoustic features are not generally optimal.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Speaker-Independent Acoustic Features for Continuous Speech Recognition

      When  processing  a new speaker in speech recognition, it is
desirable to make use of information derived  from  previous
speakers.  This cuts down on the amount of training data and training
time  required  before  the  new  speaker  can  be recognized.  To
facilitate the creation of acoustic models for a new speaker, it is
desirable that the new  speaker  and the  previous  (reference)
speakers  share  common acoustic features.   In many current
systems,  this  commonality  is ensured by defining acoustic features
manually, but manually defined acoustic features are not generally
optimal.

      Better  acoustic features can be obtained via an eigensystem
analysis of a speaker's total covariance matrix and  average
within-class  covariance matrix, where the "classes" are the speech
sounds that have to be distinguished.    The  problem with  this
approach  is that no two speakers share the same acoustic features

(eigenvectors)  because  no  two  speakers have  identical
covariance matrices.  Sometimes the data of several speakers is
combined and an eigensystem analysis  is performed  as  though  the
data  all  comes  from  a single speaker.   The resulting
eigenvectors,  however,  are  poor because this procedure does not
handle inter-speaker differences  adequately.    In the invention
below, an algorithm is described  which  leads  to  much  improved
speaker-independent eigenvectors.

      It will be assumed that some training data from several
different  refe...