Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Estimation of Discrete-Parameter Hidden Markov Model Statistics in Speech Recognition Systems When Training Data is Limited

IP.com Disclosure Number: IPCOM000111081D
Original Publication Date: 1994-Feb-01
Included in the Prior Art Database: 2005-Mar-26
Document File: 2 page(s) / 87K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+4]

Abstract

In speaker-dependent speech recognition systems using discrete parameter hidden Markov models, it is customary to obtain label prototypes from a sample of training data provided in advance by each speaker who wishes to use the system. Some such systems use supervised labels; that is, the correct labels of the training data are assumed to be determinable, and the label prototypes are designed so as to reproduce these "correct" labels as accurately as possible.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Estimation of Discrete-Parameter Hidden Markov Model Statistics in
Speech Recognition Systems When Training Data is Limited

      In speaker-dependent speech recognition systems using discrete
parameter hidden Markov models, it is customary to obtain label
prototypes from a sample of training data provided in advance by each
speaker who wishes to use the system.  Some such systems use
supervised labels; that is, the correct labels of the training data
are assumed to be determinable, and the label prototypes are designed
so as to reproduce these "correct" labels as accurately as possible.

      The problem with supervised labels is that having obtained
label prototypes from the training data, the labelled training data
is no longer representative of labelled test data: the training data
labels are atypically "clean".  Training on artificially clean labels
leads to output probability distributions whose entropy is too low,
which ultimately leads to recognition errors.

      This article advocates the use of speaker-independent output
probability distributions to relieve this problem.  This requires
that the label alphabet also be speaker-independent, but to preserve
the speaker-dependent nature (and accuracy) of the system,
speaker-dependent supervised label proto-types are employed.

      Assume the existence of some training data obtained from
several different speakers, who will be referred to as the prep
speakers.  There should be at least 10 prep speakers - preferably
more.  Each prep speaker must provide more than N sentences, where N
denotes the number of training sentences usually provided by a
speaker who wishes to use the system.  After subtracting N sentences
from each of the prep speaker's contributions, there should remain a
total of at least 1000 sentences - preferably more - spread equally
among the speakers.  The procedure is as follows.

1.  Using only N sentences from each speaker, compute
    speaker-dependent supervised prototypes for each prep speaker
    independently.  The prototypes must be associated with a
    speaker-independent label alphabet.  A suitable alphabet, a
    suitable supervisory process, and suitable algorithms for
    prototype construction can be found in [1-4].

2.  Discard from each prep speaker's data, the N sentences used in
    Step 1.

3.  Label the remaining training data for each prep speaker using
    speaker's...