Browse Prior Art Database

Training Monitor Using Speaker-Independent Statistics

IP.com Disclosure Number: IPCOM000104195D
Original Publication Date: 1993-Mar-01
Included in the Prior Art Database: 2005-Mar-18
Document File: 2 page(s) / 66K

Publishing Venue

IBM

Related People

Daggett, G: AUTHOR [+4]

Abstract

Disclosed is a method for monitoring the recording of training materials for a speech recognition system through the use of concurrent recognition to flag and reject inadequate tokens.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Training Monitor Using Speaker-Independent Statistics

      Disclosed is a method for monitoring the recording of training
materials for a speech recognition system through the use of
concurrent recognition to flag and reject inadequate tokens.

      Training is required to enroll a new user on a
speaker-dependent or speaker-adaptive speech recognition system.
While it is advantageous to have an automatic procedure for training,
there are may factors that may not be apparent to naive speakers that
can result in poor training recordings, and consequently degraded
recognition performance.  Training typically involves having the user
speak a predetermined set of training tokens and then deriving
parameters used during recognition from these recordings.  Examples
of these parameters might be the transition and output probabilities
used by a Markov model.

      It is clearly advantageous if users are able to perform this
training automatically and without assistance from others.  However,
there are potential problems with such unsupervised training which
can ultimately result in poor recognition performance, such as:

o   poor acoustic environment
o   badly placed microphone
o   mispronunciations
o   run-together words or split words

While  training  algorithms  may be robust and successful in face of
a small number of such errors, a significant  number can  result  in
degraded recognition performance or complete failure of the training
process.  This places an unacceptable burden on the user to diagnose
the cause of error, re-record the training tokens, and repeat the
training process.

      To  prevent  the  use  of  inadequate   materials   for
training,   it   is  useful  to  automatically  monitor  the
recording of training tokens while  the  user  is  speaking.  This
allows  problems  to  be  flagged  and corrected imme- diately.  This
monitoring can...