Browse Prior Art Database

Constructing Models for Command Word Spotting in Isolated Speech

IP.com Disclosure Number: IPCOM000103714D
Original Publication Date: 1993-Jan-01
Included in the Prior Art Database: 2005-Mar-18
Document File: 2 page(s) / 105K

Publishing Venue

IBM

Related People

Gopalakrishnan, PS: AUTHOR [+2]

Abstract

In the IBM speech recognizer [1] a small set of words is used as commands that cause the recognizer to take some action other than printing out the transcribed word. For example, if the speaker says the word Uppercase, the following word that is uttered is printed in capitals. The command words used in the system are Capital, Erase, Uppercase, Newline, Newparagraph, Spellmode, and Endspellmode. Very often, the recognizer is not fast enough to keep up with the speech input so that there is some delay between the utterance of the word and the appearance of the decoded word on the screen. To overcome this problem, an algorithm is needed that can spot any of the command words with very high accuracy and is instantaneous.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Constructing Models for Command Word Spotting in Isolated Speech

       In the IBM speech recognizer [1] a small set of words is
used as commands that cause the recognizer to take some action other
than printing out the transcribed word.  For example, if the speaker
says the word Uppercase, the following word that is uttered is
printed in capitals.  The command words used in the system are
Capital, Erase, Uppercase, Newline, Newparagraph, Spellmode, and
Endspellmode.  Very often, the recognizer is not fast enough to keep
up with the speech input so that there is some delay between the
utterance of the word and the appearance of the decoded word on the
screen.  To overcome this problem, an algorithm is needed that can
spot any of the command words with very high accuracy and is
instantaneous.

      An algorithm was described in [2] for spotting command words.
This invention provides a method for constructing accurate models for
the command words and an alternate model to be used in the algorithm.
Also, an algorithm is presented for accurately spotting the end
points of utterances in isolated speech using these models.  Combined
with the technique described in [2], this provides a method for
spotting command words in isolated speech that can be run in parallel
with a regular decoder to identify occurrences of commands in a quick
fashion.  The description for the invention appears below.
      1.  Record a set of utterances of each of the command words
from each speaker in addition to the standard training script used to
train the system to a speaker.  Typically, 10 utterances of each
command word by a speaker are sufficient.
      2.  Train the fenemic baseforms using the standard training
script (which does not contain any occurrences of the command words).
      3.  Using the parameters obtained above and the training
utterances obtained in step 1 for a speaker, construct fenemic
baseforms for each command word for that speaker using the method
described in [3].  Use the original set of 203 fenemic Markov models
to construct these base forms.
      4.  Now use the method described in [2] to make models for the
command words and the mumble machine.  We now have speaker-specific
models for each of the command words and the mumble machine.

      We now use the algorithm described in [2] to spot occurrences
of command words.  To make the algorithm self contained, we need a
method of det...