Browse Prior Art Database

Procedure for Using Contextual Information to Obtain Improved Estimates of Word Probabilities in a Speech Recognition System

IP.com Disclosure Number: IPCOM000038491D
Original Publication Date: 1987-Jan-01
Included in the Prior Art Database: 2005-Jan-31

Publishing Venue

IBM

Related People

Authors:
Bahl, LR Brown, PF de Souza, PV Jelinek, F Mercer, RL [+details]

Abstract

The present invention discloses methodology for incorporating additional context into m-gram language models used in speech recognition without exponentially increasing the number of m-grams which results from increasing m. In a tri-gram language model function words -- such as "the", "a", "of" -- are well predicted from the previous two words. Content words -- such as "computer", "wealthy", "hire" -- often require more context to be predicted accurately. Because the number of m-grams increases exponentially with m, it is not practical to simply increase m to obtain more context. To achieve the increased context without an attendant combinatorial explosion, the present invention produces an estimator for predicting the last word in an m-gram (or, as described herein, a tri- gram).