Browse Prior Art Database

Automatic Construction of Fenemic Markov Word Models for Speech Recognition

IP.com Disclosure Number: IPCOM000102453D
Original Publication Date: 1990-Nov-01
Included in the Prior Art Database: 2005-Mar-17
Document File: 5 page(s) / 181K

Publishing Venue

IBM

Related People

Ferretti, M: AUTHOR [+2]

Abstract

This article describes a technique to automatically build fenemic Markov word models starting from the phonetic transcription of the word. The technique consists in a new method for predicting, given a phonetic context never used in the training data, which is its corresponding similarity class.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 35% of the total text.

Automatic Construction of Fenemic Markov Word Models for Speech Recognition

       This article describes a technique to automatically build
fenemic Markov word models starting from the phonetic transcription
of the word.  The technique consists in a new method for predicting,
given a phonetic context never used in the training data, which is
its corresponding similarity class.

      Background Systems that use Markov word models for automatic
speech recognition usually employ two kinds of models: phonetic and
fenemic.  Phonetic word models are built on the basis of phonetic and
linguistic knowledge.  Fenemic word models, called fenemic baseforms,
attempt to model the pronunciation of the word by taking into account
the way the word is actually uttered by speakers.  Fenemic baseforms
are usually built by analyzing multiple utterances of the word
(1,2,3).

      This procedure has a substantial drawback.  If the speech
recognition system must be tailored to a new application and the
vocabulary must be changed, multiple utterances for all the new words
must be supplied. Collecting these data may be very costly and, in
some cases, impossible.  To overcome this problem several methods
have been proposed to automatically build the fenemic baseforms
starting from the phonetic transcription of the word (4). The basic
idea in these methods is to build the fenemic baseforms by
concatenating fenemic Markov models of the phones that constitute the
phonetic transcription of the word.  For each phone in the phonetic
alphabet, a set of different fenemic Markov models are used.  Each
model represents the pronunciation of the phone in a class of
acoustically similar phonetic contexts.  The fenemic model associated
to a phone in a certain context is usually called leafemic baseform.
This name is due to the fact that decision trees are used to predict,
given the phonetic context for that phone, which is the corresponding
class and the associated fenemic model.  The process to build a
decision tree able to predict, given a phonetic context, the
corresponding class is very time-consuming.  It can require several
days of processing time on a powerful computer.

      This article describes a new method for determining, given the
phonetic context for a phone, which fenemic Markov model is the best
to represent the pronunciation of that phone in that context.  The
use of this method requires only a few minutes of processing time to
build the data necessary to predict the phone similarity class.

      Algorithm It is assumed the availability of some training data,
consisting of more than 6000 words, each uttered by ten different
speakers.  It is also assumed that the training data has been
processed and transformed into a series of acoustic vectors.
Finally, it is assumed that phonetic Markov models exist for each
word in the training data and for all the words in the new
vocabulary.  The training data must be collected to provi...