Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

New Varieties of Fenemic Markov Models for Continuous Speech Recognition

IP.com Disclosure Number: IPCOM000121283D
Original Publication Date: 1991-Aug-01
Included in the Prior Art Database: 2005-Apr-03
Document File: 2 page(s) / 81K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+5]

Abstract

In continuous speech the co-articulation effect changes the pronunciation of words considerably. An effective way of capturing these phenomena automatically based on constructing decision trees is described in [*]. In this method the label sequences from several utterances of each phone in different contexts are extracted. A decision tree is built for each phone by interrogating the context in which the phone occurs. The goodness of each split is measured by fitting a model to the strings at each node and measuring how much improvement in the fit is obtained by the splits. The label strings that end up at one particular leaf in the tree are used to make a fenemic baseform for the phone in the context given by the answers to the questions leading to that leaf from the root of tree.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

New Varieties of Fenemic Markov Models for Continuous Speech Recognition

      In continuous speech the co-articulation effect changes
the pronunciation of words considerably.  An effective way of
capturing these phenomena automatically based on constructing
decision trees is described in [*].  In this method the label
sequences from several utterances of each phone in different contexts
are extracted.  A decision tree is built for each phone by
interrogating the context in which the phone occurs.  The goodness of
each split is measured by fitting a model to the strings at each node
and measuring how much improvement in the fit is obtained by the
splits.  The label strings that end up at one particular leaf in the
tree are used to make a fenemic baseform for the phone in the context
given by the answers to the questions leading to that leaf from the
root of tree.  These baseforms are made by using a variety of 200 or
so fenemic models and determining the fenemic machine sequence that
maximizes the joint likelihood of the strings at the leaf.

      It is likely that 200 or so fenemic machines are not enough to
accurately model all the sounds occurring in continuous speech.  The
invention described here provides a method for increasing the number
of fenemic phone varieties so as to make the models more accurate.

      The method used for determining the necessary variety of
fenemic models is the following:
1.   Construct the decision trees and the fenemic baseforms using the
method described in [*] and the original fenemic models.  Let the
original fenemic models be numbered f1f2 ...,fF and the underlying
phones be denoted by P1P2 ...,Pn .
2.   Align the training data from several speakers (say, 10) against
the baseforms constructed in step 1 using the Viterbi algorithm.
This gives the fenemic model and the leaf of the tree against which
each label in the training data is aligned.
3....