Browse Prior Art Database

Construction of Non-Linear Feneme-Based Markov Word Baseforms From Multiple Word Utterances

IP.com Disclosure Number: IPCOM000036445D
Original Publication Date: 1989-Sep-01
Included in the Prior Art Database: 2005-Jan-29
Document File: 2 page(s) / 14K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+4]

Abstract

In a speech recognition system wherein words, when uttered, result in a string of generated labels or symbols (hereafter referred to as "fenemes"), each generated feneme being selected from an alphabet of fenemes, and wherein each word in a vocabulary is represented by a baseform constructed of Markov models, the present invention relates to methodology for converting a linear fenemic baseform in which each feneme in a string is replaced by a corresponding Markov phone model into a non-linear baseform in which segments of a word baseform may be characterized by Markov phone models in parallel.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Construction of Non-Linear Feneme-Based Markov Word Baseforms From Multiple Word Utterances

In a speech recognition system wherein words, when uttered, result in a string of generated labels or symbols (hereafter referred to as "fenemes"), each generated feneme being selected from an alphabet of fenemes, and wherein each word in a vocabulary is represented by a baseform constructed of Markov models, the present invention relates to methodology for converting a linear fenemic baseform in which each feneme in a string is replaced by a corresponding Markov phone model into a non-linear baseform in which segments of a word baseform may be characterized by Markov phone models in parallel.

In practicing the invention, it is assumed that a subject word is uttered a number of times, resulting in a number of corresponding feneme strings. It is also assumed that a linear fenemic baseform is created for each word in a vocabulary. In a linear fenemic baseform, a word is represented by a sequence of Markov phone models (or probabilistic finite state machines) each of which represents a given feneme that can be generated in response to uttered speech.

A phone model of a fenemic baseform is characterized by a plurality of states and transitions between states. Each transition has a probability of being taken. In addition, there are label output probabilities which indicate the likelihood of a given feneme phone model producing a given label at a given transition determined from utterances made during a training session.

A technique for creating a baseform of fenemic phone models based on multiple utterances of words briefly includes the steps of (a) transforming multiple utterances of the word segment into respective strings of fenemes; (b) defining a set of fenemic Markov phone models; (c) determining the best single phone model P1 for producing the multiple feneme strings; (d) determining the best two- phone baseform of the form P1P2 or P2P1 for producing the multiple feneme strings; (e) aligning the best two-phone baseform against each feneme string; (f) splitting each feneme string into a left portion and a right portion with the left portion corresponding to the first phone model of the two-phone baseform and the right portion corresponding to the second phone model of the two-phone baseform; (g) identifying each left portion as a left substring and each right portion as a right substring; (h) processing the set of left substrings and the set of right substrings in the same manner as the set of feneme strings corresponding to the multiple utterances including the further step of inhibiting further splitting of a substring when the single-phone baseform thereof has a higher probability of producing the substring than does the best two-phone baseform;...