A method of using multi-word phrases to enhance allophonic contextual information in speech recognition
Original Publication Date: 2004-Feb-18
Included in the Prior Art Database: 2004-Feb-18
Most speech recognition engines do not take into account the "right-context" beyond the current word boundary. This trades loss of accuracy for gain of speed: such modeling does not account for the potential coarticulation effects by which the next word may impact the pronunciation of the current one. Short common words that often happen in conjunction are likely to be spoken as a single unit, with strong coarticulation effects. For those cases where it is both needed and practical (for example, for decoding digit strings), we change the vocabulary units to include "phrases". These coarticulations now occur "in-token", which allows for modeling them with appropriate contextual allophonic models.