Browse Prior Art Database

Context-Dependent Length-Constrained Phonetic Models for the Acoustic Detailed Match

IP.com Disclosure Number: IPCOM000115811D
Original Publication Date: 1995-Jun-01
Included in the Prior Art Database: 2005-Mar-30
Document File: 2 page(s) / 89K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+2]

Abstract

A new algorithm is disclosed for improving the accuracy of speech recognition systems. This is done by improving the acoustic modelling by incorporating length information in the models.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Context-Dependent Length-Constrained Phonetic Models for the Acoustic
Detailed Match

      A new algorithm is disclosed for improving the accuracy of
speech recognition systems.  This is done by improving the acoustic
modelling by incorporating length information in the models.

      In the IBM* Speech Recognition System, there are two stages in
the recognition, the first stage does a fast acoustic match using
simple models, to provide a shortlist of words at a given time [1].
To discriminate between these words in the process of picking the
best, the second stage incorporates a detailed acoustic match using
more sophisticated models for the phones.  Typically, the model for a
word is made up by concatenating the models of its constituent
phones, and to account for coarticulation and similar features, the
model for each phone is usually made to depend on the phonetic
context in which it occurs (both the phonetic context within the
word, and across word boundaries) [2,3].  These special instances of
a phone, in a given context, are called allophones.  Typically
however, the models corresponding to different allophones of a phone
differ only in their output distributions, and not in their topology
or minimum lengths.  This is a potential drawback because the length
of the acoustic sequence corresponding to a phone also varies as a
function of the context in which it occurs.  In this disclosure, a
method is described to make up allophonic models where the machine
topology, minimum lengths and output distribution are all made to
depend on the context in which the phone occurs.  The use of these
new context dependent models results in a reduction in the overall
error rate.

The algorithm incorporates the following features:
  1.  The starting point is the phonological rule tree, described in
       (2).  Briefly speaking, a tree is grown for each phone, that
       separates instances of the phone, depending on the context in
       which it occurs.  Hence the leaves of the tree represent the
       allophones of that particular phone.
  2.  Next, the training data is poured down this tree, and the label
       sequences corresponding to each allophone are collected.
       Subsequently, the distribution of the lengths of the sequences
is
       obtained, at each allophone.  In order to model this length
       distribution, the minimum length of the model corr...