Browse Prior Art Database

Method for Spotting Command Words in Isolated Speech

IP.com Disclosure Number: IPCOM000106974D
Original Publication Date: 1992-Jan-01
Included in the Prior Art Database: 2005-Mar-21
Document File: 3 page(s) / 109K

Publishing Venue

IBM

Related People

Gopalakrishnan, PS: AUTHOR [+2]

Abstract

In the IBM speech recognizer (1) a small set of words is used as commands that cause the recognizer to take some action other than printing out the transcribed word. For example, if the speaker says the word "Uppercase," the following word that is uttered is printed in capitals. The command words used in the system are Capital, Erase, Uppercase, Newline, Newparagraph, Spellmode, and Endspellmode. Very often, the recognizer is not fast enough to keep up with the speech input so that there is some delay between the utterance of the word and the appearance of the decoded word on the screen. Sometimes, this causes confusion for the user.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Method for Spotting Command Words in Isolated Speech

       In the IBM speech recognizer (1) a small set of words is
used as commands that cause the recognizer to take some action other
than printing out the transcribed word.  For example, if the speaker
says the word "Uppercase," the following word that is uttered is
printed in capitals.  The command words used in the system are
Capital, Erase, Uppercase, Newline, Newparagraph, Spellmode, and
Endspellmode.  Very often, the recognizer is not fast enough to keep
up with the speech input so that there is some delay between the
utterance of the word and the appearance of the decoded word on the
screen.  Sometimes, this causes confusion for the user.  Also,
because of the nature of the stack decoding algorithm, the recognizer
sometimes goes back and changes a word several words before the
current one, as new evidence comes up about the candidate sentence.
This causes confusion for the user when the word that is changed is a
command word.  To get over these problems, we need an algorithm that
can spot any of the command words with very high accuracy and is
instantaneous.  An important requirement of this algorithm is that is
should not use any information about the context in which the command
words occur, since this causes delays in decoding.

      Below, we describe a method for spotting occurrences of command
words in isolated speech.  This does not make use of the language
model and is based only on the acoustic evidence available from each
utterance.  As a result, it is much faster than the stack decoding
algorithm.
      1.  Record a set of utterances of the command words from each
speaker.  This will be in addition to the standard training script
used to train the system to a speaker. Typically, ten utterances of
each command word by a speaker are sufficient.
      2.  Train the fenemic baseforms (2) using the standard training
script (which does not contain any occurrences of the command words).
      3.  Now, each command word has a baseform made up using the
standard procedure described in (2).  These are made up from a set of
200 or so fenemic hidden Markov models. We con         struct more
detailed baseforms for the command words by mod         ifying these
standard fenemic models as follows.  Order the command words 1,...,m.
Let Fi be the set of distinct fenemic models, that are not models for
silence, appearing in the original baseform of command word i.
Number these 1,... Fi   . ...