Browse Prior Art Database

Automatic Construction of Acoustic Turboform Models for Use In a Speech Recognition System

IP.com Disclosure Number: IPCOM000101169D
Original Publication Date: 1990-Jul-01
Included in the Prior Art Database: 2005-Mar-16
Document File: 2 page(s) / 81K

Publishing Venue

IBM

Related People

Bahl, L: AUTHOR [+3]

Abstract

The acoustic models used in speech recognition systems are often created automatically from one or more utterances of each sound or each word in the vocabulary [1,2]. Commonly, each model is constructed so as to maximize the probability of producing all the label sequences of the utterances from which it is derived [1,2]. This document describes an alternative algorithm for constructing acoustic Markov models, which is not based on the notion of maximum-likelihood. The resulting models are called turboforms.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Automatic Construction of Acoustic Turboform Models for Use In a Speech Recognition System

       The acoustic models used in speech recognition systems
are often created automatically from one or more utterances of each
sound or each word in the vocabulary [1,2].  Commonly, each model is
constructed so as to maximize the probability of producing all the
label sequences of the utterances from which it is derived [1,2].
This document describes an alternative algorithm for constructing
acoustic Markov models, which is not based on the notion of
maximum-likelihood.  The resulting models are called turboforms.

      We will assume the existence of some labelled training data,
and some trained Markov model statistics.  The following steps are
performed for each model to be constructed.
Step 1.   Construct a feneme-based Markov model [2] from the
available label sequences.
Step 2.   Viterbi align each label sequence against the current model
(3).
Step 3.   Perform Steps 4-8 for each fenemic phone P (2) in the
current model.
Step 4.   Using the Viterbi alignments, determine which labels
aligned with P could be made more probable if P were to be replaced
by a different fenemic phone.  These labels are called sad labels.
Step 5.   For each sad label of Step 4, count the number of
utterances in which it appears aligned against P.  Let L denote the
sad label which occurs in the greatest number of utterances.  If two
or more labels are tied for this honor, select the one whose log
output probability differs the most from the maximum value obtainable
by replacing P.
Step 6.   If L occurs in fewer than N utterances, make no change to
the model: skip Steps 7-8, continue at Step 3 with the next phone.
Otherwise, expand the model: perform Steps 7-8.  A reasonable value
of N is MAX(2, number of utterances/20).
Step 7.   Let Q denote the fenemic phone which maximizes t...