Browse Prior Art Database

Projection Matrix for a New Talker Based On a Sentence Or Two

IP.com Disclosure Number: IPCOM000102693D
Original Publication Date: 1990-Dec-01
Included in the Prior Art Database: 2005-Mar-17
Document File: 4 page(s) / 97K

Publishing Venue

IBM

Related People

Nadas, A: AUTHOR [+4]

Abstract

The required projection matrix is based only on low (50) dimensional phone centroids in reference speech and on high (189) dimensional phone centroids in new speech. The solution is efficiently computed from the singular value decomposition of the rectangular (low by high) cross covariance matrix of the two sets of centroids.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Projection Matrix for a New Talker Based On a Sentence Or Two

       The required projection matrix is based only on low (50)
dimensional phone centroids in reference speech and on high (189)
dimensional phone centroids in new speech. The solution is
efficiently computed from the singular value decomposition of the
rectangular (low by high) cross covariance matrix of the two sets of
centroids.

      Our present algorithms avoid explicit modeling the statistical
dependence among frames along a path in state space by attempting to
capture the information in the time context in a simpler way.  A
block of time-adjacent frames is regarded as a high dimensional
super-frame and, for practical reasons, the super-frame is projected
back into a lower dimensional space.  In this invention we consider
the situation wherein there is available only a small amount of
training speech from the new talker together with a collection of
reference prototypes. Each prototype is a probability density
constructed as a finite mixture of Gaussian probability densities.
For illustrative purposes, fix the number of prototypes at 210, the
number of mixture components in any one phone at 20, the dimension of
new speech at 189 and the dimension of reference speech at 50.

                            (Image Omitted)

STEP 1. Obtain the 50 dimensional reference centroids
(1) from the reference prototypes as
(2)
for              Using the phone probabilities q1, ...., q210,
compute the centroid of all of speech as
(3) and use it to compute the centered centroids
(4)
STEP 2. Align new data using speaker-independent statistics and
compute the 189 dimensional centroids
(5) Compute the grand mean x of the new data and use it to compute
the centered centroids
(6)
STEP 3.

      Let C denote the cross covariance matrix between the two sets
of centroids
(7) and compute its singular va...