Browse Prior Art Database

Using Speaker-Independent Label Alphabet for Improved Accuracy in Speech Recognition Systems

IP.com Disclosure Number: IPCOM000100130D
Original Publication Date: 1990-Mar-01
Included in the Prior Art Database: 2005-Mar-15
Document File: 2 page(s) / 85K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+4]

Abstract

A technique is described whereby a speaker-independent label alphabet procedure improves the estimates of the involved parameters, thereby reducing the error rate in speech recognition systems.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Using Speaker-Independent Label Alphabet for Improved Accuracy in Speech Recognition Systems

       A technique is described whereby a speaker-independent
label alphabet procedure improves the estimates of the involved
parameters, thereby reducing the error rate in speech recognition
systems.

      In speaker-dependent speech recognition systems using Markov
word models, it is customary to perform speaker-dependent vector
quantification and to estimate speaker-dependent statistics from a
training sample of the speaker's spoken words.  For parameter
estimation, it is desirable that the training sample be as large as
possible, but for convenience and speed, it is desirable that the
training sample be as small as possible.

      The concept described herein enables good parameter estimates
to be obtained from a limited speech sample by performing speaker-
dependent labelling into a speaker-independent alphabet, sometimes
called a common alphabet.  Previously computed speaker-independent
statistics are then used to "smooth" the obtained speaker-dependent
statistics.

      The procedure assumes that a large sample of speech has been
drawn from a variety of speakers, referred to as the speaker-
independent training data.  It is further assumed that a relatively
small sample of the subject's speech is available, referred to as
speaker-dependent training data. The concept then follows six steps:
Step 1. Using the speaker-independent training data, compute label
prototypes by using vector-quantification procedures.  This results
in a diagonal Gaussian prototype for each phone in the recognizer's
phone alphabet [1].
   Step 2. Using the speaker-independent label prototypes and the
speaker-independent training data, compute the speaker-independent
statistics, using the forward- backward algorithm [2].

      Step 3. Using the speaker-dependent training data, use the same
technique as described in Step 1 to obtain speake...