Browse Prior Art Database

Generation of Phonetic Initial Statistics From Fenemic Training

IP.com Disclosure Number: IPCOM000100116D
Original Publication Date: 1990-Mar-01
Included in the Prior Art Database: 2005-Mar-15
Document File: 4 page(s) / 166K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+5]

Abstract

A technique is described whereby speech recognition devices can generate initial phonetic statistics from a speaker's fenemic training. The concept improves recognition accuracy and reduces processing time and storage requirements to train recognition models.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 37% of the total text.

Generation of Phonetic Initial Statistics From Fenemic Training

       A technique is described whereby speech recognition
devices can generate initial phonetic statistics from a speaker's
fenemic training.  The concept improves recognition accuracy and
reduces processing time and storage requirements to train recognition
models.

      Typically, Markov models, used in speech recognition systems,
match a sequence of acoustic labels which represent spoken utterances
against models of word pronunciations [1]. These models can be
described in terms of states, probabilistic transitions between
states and probability distributions over the acoustic labels on the
state transitions.  The transition and label output probabilities are
derived for each new speaker as part of the training process.
Initial estimates of these probabilities are generated and then
iteratively refined through the use of the Forward-Backward algorithm
(2).

      It is important to generate "good" estimates of the initial
probabilities for several reasons:  First, the final statistics
produced, using the Forward-Backward algorithm, are dependent on the
choice of initial statistics, since this algorithm converges to a
locally, rather than a globally, optimal solution.  Poor initial
statistics can result in increased training time and decreased
recognition accuracy.  Second, the initial statistics are used to
smooth the final statistics, produced by the Forward-Backward
algorithm, so as to make the system more robust to label sequences
that did not occur in the training data.  Again, poor initial
statistics can decrease recognition accuracy.

      The concept described herein presents a method of deriving the
initial statistics, required for phonetic training, from the final
results of fenemic training. Specifically, the method produces the
initial speaker-dependent label output probabilities, such that the
initial transition probabilities are predetermined and independent of
the speaker.

      In prior art, phonetic initial statistics were generated by
labeling a fixed reference set of vectors from a "standard" speaker
with the new speaker's prototype vectors (3).  A pre-computed
alignment of the reference vectors to the phonetic phones was used to
compute relative frequencies of phone-label pairs.  The relative
frequencies were then weighed and averaged with a uniform probability
distribution so as to produce the initial label output statistics.
Since the granularity of the pre-computed alignment is to phones,
rather than to arcs within phones, all of the label output
distributions within a phone were assigned the same initial
probabilities.  As a result, the quality of the initial statistics
generated by this prior-art method was quite variable.  For example,
the method was sensitive to acoustic differences between the new
speaker and the reference speaker.  It was also sensitive to
differences in acoustic environments, due to microphones...