Browse Prior Art Database

Providing Help to Users during Recording of Training Materials for a Speech Recognition System

IP.com Disclosure Number: IPCOM000104147D
Original Publication Date: 1993-Mar-01
Included in the Prior Art Database: 2005-Mar-18
Document File: 2 page(s) / 64K

Publishing Venue

IBM

Related People

Danis, CM: AUTHOR [+2]

Abstract

Disclosed is a method of attaching additional information to each token in the training materials for a speech recognition system, which can be displayed to aid the user during recording, improving the quality of the training materials.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Providing Help to Users during Recording of Training Materials for a Speech Recognition System

      Disclosed is a method of attaching additional information to
each token in the training materials for a speech recognition
system,  which can be displayed to aid the user during recording,
improving the quality of the training materials.

      To train a speaker-dependent or speaker-adaptive speech
recognition system, a user typically has to read a predeter mined set
of training tokens.  These tokens might be  words, phrases  or
complete sentences.   The set of all tokens for training is called
the training script.  The user is generally prompted  with the text
of the training script, and is asked to read it, token by token.  For
some tokens, there may be ambiguity in pronunciation.  For example,
"IEEE" could be pronounced as "I-triple-E" or as "I-E-E-E".  "PL/I"
could be pronounced  as  "P-L-one"  or  as  "P-L-slash-one".  With
complete  sentences,  there  are often questions about whether or not
punctuations marks such as "," or "."  should be spoken.

      The quality of the subsequent recognition is determined by the
quality of the training recordings; too many errors might cause
training to fail, or to produce statistics that will yield poor
recognition performance.  For example, in speech recognition systems
that use word models constructed from a finite inventory of sub-word
units, it is important that the user's pronunciation of a training
token reasonably matches the expected model used in the training
process.  Too many discrepancies between pronunciations and the under
lying  models can result in degraded recognition performance after
training, or cause the training process to  fail completely.

      This article describes a m...