Browse Prior Art Database

Techniques for Speech Recognition

IP.com Disclosure Number: IPCOM000117063D
Original Publication Date: 1995-Dec-01
Included in the Prior Art Database: 2005-Mar-31
Document File: 2 page(s) / 119K

Publishing Venue

IBM

Related People

Castellucci, F: AUTHOR [+4]

Abstract

Disclosed are several techniques for providing an interface between a user and a microprocessor, so that the user can talk naturally back and forth with the system. These techniques can be used individually or in various combinations.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 49% of the total text.

Techniques for Speech Recognition

      Disclosed are several techniques for providing an interface
between a user and a microprocessor, so that the user can talk
naturally back and forth with the system.  These techniques can be
used individually or in various combinations.

      In accordance with a first of these techniques, words,
baseforms, and phonological data are concatenated into phrases and
sentences to improve recognition and to mimic natural language
processing.  For example, in a Spanish translator, sentences with
four or fewer words are concatenated into a single isolated
utterance.  The first half of a longer sentence is concatenated into
a first phrase, while the second half is concatenated into a second
phrase.  A logic check is done to ensure that the first half of the
sentence matches the second half.  If the phrase-level bi-grams do
not match, it is known there is a speech recognition error.  This
method increases the rate of accurate recognition, simplifies
programming by eliminating a need for word spotting and Natural
Language Processing (NLP) algorithms, and provides an easy way to
eliminate random misrecognition.  Performing a best-fit acoustical
analysis at a phrase level causes the system to mimic natural
language processing, determining the correct question or phrase even
when background noise is present, when the user garbles his sentence,
or when he phrases the question differently from the finite state
grammar.

      In accordance with a second of these techniques,
speaker-independent continuous speech is used to initiate the
playback of audio (WAV) files.  For example, this method can be used
to provide common language translations, or, in a police application,
to respond to a license plate number with "No wants or warrants."
Multiple WAV files can be combined in the system response.  A single
Backus-Naur Form (BNF) with one or more imbedded variables can cause
the system to do a database retrieval or a table lookup, which in
turn determines which of several WAV files will be played back, as in
a stockbroker system programmed to respond to the command "BUY <1NUM>
<2NUM> SHARES OF <COMPANY NAME>."  Furthermore, multiple BNF forms
with imbedded variables can be processed.

      In accordance with a third of these techniques, dialog prompts
are used to launch (select) WAV files, with sentence-long files being
used to improve accuracy.  Questions, phrases, and sentences that the
user is most likely to say at a given point in an application are
displayed on the screen of the computing system.  The user scans the
list, finding the phrase which is closest to what he wants to say.
This technique greatly increases recognition accuracy while
encouraging the user to move quickly through his tasks by focussing
on the question or problem at hand.  Each prompt is associated with
one or more grammars, while each grammar (and optionally the key
variables imbedded in the grammar) is associated w...