Browse Prior Art Database

Creation of Accurate Markov Models for Function Words in Continuous Speech Recognition

IP.com Disclosure Number: IPCOM000108375D
Original Publication Date: 1992-May-01
Included in the Prior Art Database: 2005-Mar-22
Document File: 2 page(s) / 88K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+5]

Abstract

In one prominent approach to continuous speech recognition (1,2), words are modelled phone by phone in a context-dependent manner. For each phone there is a set of context-dependent Markov models, and a set of phonological rules which operate on the phone context to determine the appropriate model. A Markov model for a word is obtained by concatenating together the individual models for the component phones. In this invention, we take advantage of the high frequency of function words, and treat them as special cases.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Creation of Accurate Markov Models for Function Words in Continuous Speech Recognition

       In one prominent approach to continuous speech
recognition (1,2), words are modelled phone by phone in a
context-dependent manner.  For each phone there is a set of
context-dependent Markov models, and a set of phonological rules
which operate on the phone context to determine the appropriate
model.  A Markov model for a word is obtained by concatenating
together the individual models for the component phones.  In this
invention, we take advantage of the high frequency of function words,
and treat them as special cases.  We introduce the idea of function
lexemes, where each lexeme represents a particular idealized
pronunciation of a function word; we define function lexemes
according to their frequencies rather than their linguistic role; and
we model function lexemes, not phone by phone, but by a single
unspliced context- dependent Markov model.

      The following steps describe the process.  We will assume the
existence of some training data and a corresponding word-based
training script.
      Step 1.  For each word W in the vocabulary, determine how many
different ways it can be pronounced.  Each such pronunciation will be
referred to as a lexeme of W.  The various pronunciations can be
determined manually by a phonetician, or can be found in a suitable
dictionary.
      Step 2.  Create a phonetic Markov model (3) for each lexeme in
the vocabulary.
      Step 3.  Create a lexeme-based training script from the
word-based training script via Viterbi alignment, as described in
(4).
      Step 4.  Determine the frequency of each lexeme present in the
lexeme-based training script.
      Step 5.  Define as a function lexeme, any lexeme which occurs
sufficiently often in the script to enable phonological rules (3) to
be constructed for that lexeme as a distinct unit.  About 400
occurrences is sufficient.
      Step 6.  Partition the t...