Browse Prior Art Database

Using Non-Verbal Attention Signals in a Speech Recognition System

IP.com Disclosure Number: IPCOM000123610D
Original Publication Date: 1999-Feb-01
Included in the Prior Art Database: 2005-Apr-05
Document File: 2 page(s) / 104K

Publishing Venue

IBM

Related People

Hanson, G: AUTHOR

Abstract

Disclosed is the use of a nonverbal signal, or noise, such as a tongue click, whistle, or microphone switch noise, to alert a speech recognition system that one or more following words should be interpreted as a command, instead of as dictated text.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Using Non-Verbal Attention Signals in a Speech Recognition System

   Disclosed is the use of a nonverbal signal, or noise, such
as a tongue click, whistle, or microphone switch noise, to alert a
speech recognition system that one or more following words should be
interpreted as a command, instead of as dictated text.

   FIG. 1 is a flow diagram of a conventional speech
recognition system in the process of recognizing a spoken attention
word, which is pre-determined to indicate that one or more subsequent
words should be interpreted as commands.  In step 1, raw audio data
is received from a sound card and/or another audio subsystem.  In
step 2, the data is processed in context with preceding data to
determine the current phoneme or phonemes.  This step basically
requires some sort of lookup table containing phoneme signatures to
compare against the current signature.  Matching may or may not occur
as a best-fit scenario.  In step 3, the context is used to determine
the words constructed from the preceding and current phonemes.  In
step 4, the context is used to determine whether the word is an
attention word.  Recognition errors can occur anywhere within steps
2, 3, and 4.  If the current word is not an attention word, the word
is processed as a dictated word in step 5, with the system returning
to step 1 to repeat this process until the user has stopped talking
or until the speech recognition session has ended.  If the word is an
attention word, one or more subsequent words are processed as
commands in step 6.  When this processing is completed, the system
returns to step 1 to repeat this process.

   FIG. 2 is a flow diagram of a speech recognition system
operating in accordance with the presently disclosed method, with a
pre-determined noise providing an indication that subsequent words
should be processed as commands.  Again, in step 7, raw audio data is
received from a sound card and/or another audio subsystem.  Next, in
step 8, the received audio data is compare...