Browse Prior Art Database

Method for probabilistic Viterbi decoding using stochastic pronunciation modeling

IP.com Disclosure Number: IPCOM000032565D
Publication Date: 2004-Nov-08
Document File: 3 page(s) / 18K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method for probabilistic Viterbi decoding using stochastic pronunciation modeling. Benefits include improved functionality and improved performance.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Method for probabilistic Viterbi decoding using stochastic pronunciation modeling

Disclosed is a method for probabilistic Viterbi decoding using stochastic pronunciation modeling. Benefits include improved functionality and improved performance.

Background

      Conventional speech recognition systems (speaker independent) require two modeling algorithms: acoustic modeling and pronunciation modeling. Acoustic modeling is the process of predicting the probability of an acoustic observation given a phonetic unit.

      Pronunciation modeling is the process of predicting the pronunciation of a word given its textual representation. This process is usually called letter-to-phoneme (L2P) or grapheme-to-phoneme (G2P). There are 3 approaches to deal with this problem: obtaining large hand-written pronunciation lexicons, using pronunciation rules, and trained machine-learning based on predictors.

      The disadvantage of the hand-written lexicons is that they take a lot of memory and are, therefore, not suitable for mobile phones. Every word cannot be prepared in advance. The disadvantage of the other approaches is accuracy. They are substantially less accurate than hand-written lexicons.

      The approach taken by most mobile phone vendors is using pronunciation rules and trained machine learning. They are augmented with a small lexicon. (Its size is a function of available memory.)

      Using a machine-learning predictor for finding the pronunciation of a given name achieves substantially lower results than using hand-written pronunciation. In one experiment, 76% accuracy with hand-written pronunciations deteriorated to 41% accuracy with hidden Markhov models (HMM) with G2P.

      To cope with the errors of the G2P predictor, the strategy for name dialing is to build a pronunciation network that enables the most probable pronunciation and the (N ~ 4) most probable pronunciations. This approach gets better results, 68% accuracy on the above evaluation, which is still less than 76%. A careful analysis reveals two problems. Sometimes the right pronunciation does not appear in the network. Even if the pronunciation does appear, the flexibility of the network causes errors. Empirical proof indicates that the second problem is more severe.

 


General description

      The disclosed method is probabilistic Viterbi decoding using stochastic pronunciation modeling. The method provides improved accuracy by incorporating information about the probability of different pronunciation along the network.

Advantages

The disclosed method provides advantages, including:

•             Improved functionality due to enabling probabilistic Viterbi decoding using stochastic pronunciation modeling

•             Improved performance due to improving the accuracy of name dialing and speech recognition accuracy in all languages

•             Improved performance due to improving the noise levels

Detailed description

      The pronunciation...