Browse Prior Art Database

Metamorphic Transformations for Speech Recognition

IP.com Disclosure Number: IPCOM000100865D
Original Publication Date: 1990-Jun-01
Included in the Prior Art Database: 2005-Mar-16
Document File: 2 page(s) / 68K

Publishing Venue

IBM

Related People

Bahl, L: AUTHOR [+6]

Abstract

An algorithm is described for metamorphosing the acoustic identity of a talker into that of another, and, hence, for talker normalization of the speech signal.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Metamorphic Transformations for Speech Recognition

       An algorithm is described for metamorphosing the acoustic
identity of a talker into that of another, and, hence, for talker
normalization of the speech signal.

      Let X denote a spectral vector from talker A, and let Y denote
that of talker B.  We seek a transformation T = T(X) so that a speech
recognizer trained to recognize talker B may also be used to
recognize talker A.  Such a transformation ought to make the
distribution of X similar to the distribution of Y as these appear to
the recognizer. A major difficulty in constructing such a
transformation is that the variability between talkers and the
variability between linguistically distinct speech sounds may be
comparable in size and difficult to separate.  Consequently
unsupervised normalization will often remove speech information along
with the idiosyncratic talker characteristics.  For example, the
transformation

                            (Image Omitted)

(1)              T(X) = mB + S0.5 S-0.5(X - mA)
insures common second order statistics for the distributions of Y and
T(X) (here m, S refer to mean vectors and covariance matrices,
respectively), nevertheless, the B-recognizer typically won't work
well when presented with T(X).

      Let x1,...,xk be prototypical values of X, and let y1,...,yk be
prototypical values of Y.  The index of these prototypes is to be
linguistically meaningful (referring to the same sort of sound).
Then the pairs (xj, yj)j = 1,...,k represent examples of the sort of
transformation we seek. It is possible to approximate the desired
transform with the best linear mapping (e.g., a least sq...