Browse Prior Art Database

Improved Correction Of Speech Recognition Errors Through Audio Playback

IP.com Disclosure Number: IPCOM000104840D
Original Publication Date: 1993-Jun-01
Included in the Prior Art Database: 2005-Mar-19
Document File: 2 page(s) / 117K

Publishing Venue

IBM

Related People

Andreshak, JC: AUTHOR [+7]

Abstract

Disclosed is a method to aid the user of a speech recognition system in identifying and correcting recognition errors by making available through playback, the original audio signal for each recognized word. This allows the speaker, or another person responsible for correcting a document, to identify errors by noting differences between the audio signal and the recognized text, and correct errors by reviewing what was actually spoken, but misrecognized.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 48% of the total text.

Improved Correction Of Speech Recognition Errors Through Audio Playback

      Disclosed is a method to aid the user of a speech recognition
system in identifying and correcting recognition errors by making
available through playback, the original audio signal for each
recognized word.  This allows the speaker, or another person
responsible for correcting a document, to identify errors by noting
differences between the audio signal and the recognized text, and
correct errors by reviewing what was actually spoken, but
misrecognized.

      Speech recognition systems will generally introduce errors in
the transcription of spoken text.  This might be due to acoustic
noise, differences in pronunciation due to dialect, poor acoustic or
linguistic models, or the simple lack of a particular word in the
system's vocabulary.  These errors must be corrected by the user if
an error-free transcription is required, for example, in the case of
dictating business correspondence.

      For many users, the preferred method of document creation is to
dictate a large section of text (i.e. a paragraph, a page) without
monitoring for errors, and then to review and correct any recognition
errors in a subsequent pass.  This allows the user to focus
completely on document composition while dictating.  An alternate
method involves verifying and correcting recognition performance on a
word-by-word basis.  This is often more difficult because it requires
the user to focus on two different tasks at the same time:  the
abstract composition of the document and the correction of
recognition errors.  This difficulty may be compounded by delay
between when the word is spoken and when it is recognized.

      One drawback to correcting on a separate pass is that the user
needs to identify and correct recognition errors in the text long
after the words were spoken.  As unlikely as it seems, users are
often able to identify the position of errors, but unable to
reconstruct the words that were originally spoken.  Alternately, this
second pass for correction might be done by a different person, for
example a secretary correcting a draft dictated, but not corrected,
by someone else.

      In the disclosed method, the speech recognition system
provides:

1.  The recognized text, which is displayed to the user.  This text
    might contain the same number of words as spoken (substitution
    errors), fewer words (deletion errors) or more words (insertion
    errors).

2.  The audio recording of the original speech.  This audio might be
    stored at the same bandwidth and resolution as originally used
    for recognition, or it might be compressed to reduce storage
    requirements.

3.  An index that maps each recognized word into a corresponding
    segment of the audio recording.

4.  A means for the user to request audio playback of an individual
    word or group of words.  This playback could be at the same
    band...