Growing Phonetic Baseforms From Multiple Utterances in Speech Recognition
Original Publication Date: 1987-Sep-01
Included in the Prior Art Database: 2005-Feb-01
The most likely sequence of hidden Markov model phones which constitute a vocabulary word is determined by (a) generating a string Si(where 1 & i & n) of labels (speech prototype vectors) for each of n utterances of a word; (b) determining the probability of each string Si given a prescribed sequence Pj of phones; (c) computing (d) multiplying Pragg by the prior probability of Pj to provide a joint probability; (e) repeating steps (a) through (d) for each of a plurality of phone sequences Pj; and (f) by iterative stack decoding, determining which phone sequence has the best joint probability (for the Si strings) above a prescribed threshold. The stack decoding involves determining a first probability measure based on acoustics and a second probability measure based on a language model.