Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Metric to Evaluate Training Scripts and Dynamically Assess the Quality of Training Data

IP.com Disclosure Number: IPCOM000106686D
Original Publication Date: 1993-Dec-01
Included in the Prior Art Database: 2005-Mar-21
Document File: 2 page(s) / 80K

Publishing Venue

IBM

Related People

Epstein, M: AUTHOR [+2]

Abstract

In speaker-dependent automatic speech recognizers, the user must read an enrollment script so that the system can learn how the user pronounces the different sounds. This leads to two problems: selecting a training script with suitable phonetic coverage, and deciding if enough data has been given so that training is likely to be successful. This invention presents an algorithm for solving these problems.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Metric to Evaluate Training Scripts and Dynamically Assess the Quality of Training Data

      In speaker-dependent automatic speech recognizers, the user
must read an enrollment script so that the system can learn how the
user pronounces the different sounds.  This leads to two problems:
selecting a training script with suitable phonetic coverage, and
deciding if enough data has been given so that training is likely to
be successful.  This invention presents an algorithm for solving
these problems.

      The main challenge in solving the first problem is to make sure
that the training script includes all the phonemes in their proper
context.  Contexts are determined by the neighboring phonemes.  This
invention uses leafemic baseforms[1]  to model the different contexts
produced by neighboring phonemes.  However, the importance of a
context must be conditioned upon the probability of the words in
which they occur.  This is done by using the unigram probabilities of
the words.  In particular:

Count sub leafeme % = % Count sub phone % * % < prob(leafeme) > over
< prob(phon
where
prob(leafeme) % = % sum from < w memberof Vocab > prob(leafeme|w) % *
% prob(w)
prob(phone) % = % sum from < w memberof Vocab > prob(phone|w) % * %
prob(w)

     Naturally,
prob(phone|w)
is simply the number of times phone occurs in w, divided by the
number of phones in w.
Count sub phone
is an empirically determined constant for each phone, determined by
analyzing many training scripts.

      Thus, the required count for each leafeme depends on the
probabilities of the words which contain the leafeme.  Note that
Count sub leafeme
is constant for a vocabulary and a language model.  Thus, once these
have been computed, different training scripts can be analyzed using
these counts.  A score for the training script can be produced using:

Score % = % sum from leafemes max( 1.0,< Training sub leafeme > over
< Count sub
where
Training sub leafeme
is the number of counts of the leafeme in the training script.  If
all the leafemes attain their minimum counts, then a score of 1.0 is
produced.  Note that since the scores for each leafeme are weighted
by their probability, the more probable leafemes must have better
scores in order for the training script to get a good score.

  ...