Browse Prior Art Database

Vector Tagging Procedure for Speaker-Normalisation in Speech Recognition Systems Using Sub-Word Fenomic Acoustic Markov Models

IP.com Disclosure Number: IPCOM000111198D
Original Publication Date: 1994-Feb-01
Included in the Prior Art Database: 2005-Mar-26
Document File: 4 page(s) / 121K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+2]

Abstract

It is desirable that speech recognition systems be able to make use of previous speech when encountering new speakers. For this reason, procedures are described in [1,2] which attempt to map a test speaker's acoustic parameter vectors as closely as possible to corresponding vectors from a reference speaker. Central to these mappings are Viterbi alignments [3] from which tags are derived for each acoustic vector; these tags are used to determine which vectors from one speaker are equivalent to which vectors from the other. In one prominent approach [2] the Viterbi alignments are performed using allophonic leafform acoustic models, but experimental evidence suggests that alignments obtained in this environment are often inaccurate.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 48% of the total text.

Vector Tagging Procedure for Speaker-Normalisation in Speech Recognition
Systems Using Sub-Word Fenomic Acoustic Markov Models

      It is desirable that speech recognition systems be able to make
use of previous speech when encountering new speakers.  For this
reason, procedures are described in [1,2]  which attempt to map a
test speaker's acoustic parameter vectors as closely as possible to
corresponding vectors from a reference speaker.  Central to these
mappings are Viterbi alignments [3]  from which tags are derived for
each acoustic vector; these tags are used to determine which vectors
from one speaker are equivalent to which vectors from the other.  In
one prominent approach [2]  the Viterbi alignments are performed
using allophonic leafform acoustic models, but experimental evidence
suggests that alignments obtained in this environment are often
inaccurate.  The invention below details a method of obtaining more
accurate alignments, and also specifies an improved method of
defining vector tags.

      Experimental evidence shows that when Viterbi alignments are
made using allophonic leafforms comprised of loopy fenones like this:
the resulting alignments tend to have accurate leafform end-points,
but inconsistent fenone end-points.  The inconsistency stems from the
fenone's freedom to produce any number of outputs from 0 to infinity.
Experimental evidence also shows that when Viterbi alignments are
made using allophonic leafforms comprised of loopless fenones like
this:  the resulting alignments have much more consistent fenone
end-points-, but sometimes have inaccurate leafform end-points.
Inaccurate leafform end-points arise because no leafform can produce
more outputs than twice the number of fenones in the leafform.

Based on this evidence, an improved alignment is obtained by
performing the following steps.

Step 1.

Create sub-word allophonic leafforms [4]  to cover the vocabulary.
The sub-words may be phonemes, for example.  Models are created from
each word by concatenating the appropriate leafforms.  Note that no
word should have its own special model (as is sometimes done).

Step 2.

Train and Viterbi align [3]  some training data against the models of
Step 1, using the loopy fenone shown above.

Step 3.

Using the same training data, train the models of Step 1, using the
loopless fenone shown above.

Step 4.

Adjust the loopless transition probabilities as follows.  If
         Pr(1 output)< Pr(2 outputs).Pr(0 outputs)
      then set
         Pr(1 output) = SQRT(Pr(2 outputs).Pr(0 outputs)) + k where k
is small constant, and renormalise the transition probabilities.  A
typical value for k is 0.001.  This step is performed to discourage
the alignment from placing two outputs against a single fenone, and
none against an adjacent fenone, where it could just as easily have
placed one each against both fenones.  This improves consistency.

Step 5.

Discard any portion of the Viterbi alignment...