Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Humour Language Model module in ASR to prevent humorous decoding errors.

IP.com Disclosure Number: IPCOM000018617D
Original Publication Date: 2003-Jul-28
Included in the Prior Art Database: 2003-Jul-28
Document File: 2 page(s) / 48K

Publishing Venue

IBM

Abstract

The purpose of this invention is to reduce the psychological impact of speech recognition, speech synthesis, and machine translation errors that inevitably do occur, so that they have a less negative effect on the viewer/listener. This invention proposes "smearing" the differences between candidate target words, and leaving the decision of what is perceived up to the viewer. A novel method to "smear" differences is proposed in this invention, using handwriting. Humans can read each other's handwritings despite large disparities in the way individuals form their letters, because they bring their human intelligence and context to bear on the task. Handwritten letters, when presented in isolation, can easily be "misread" as other letters than the intended letter. In context, though, the viewer "sees" what he/she is expected to see, as a function of context. This invention proposes exploiting this ambiguity, so that speech recognition errors are not so egregiously apparent. The ambiguity of handwriting can be similarly exploited for errors in machine translation. It can also be used to soften reactions to potentially risque or embarrassing words, by presenting those words as ambiguous with the closest near-match. Finally, this hybrid approach can be applied to synthesized output, by presenting acoustically ambiguous synthesized output when the pronunciation is not clear.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 2

Humour Language Model module in ASR to prevent humorous decoding errors.

   Speech recognition and speech synthesis technologies have improved dramatically over the last decade, but errors do still occur with no immediately obvious solution. The technologies are heavily used in a number of arenas, however, despite their errors. The purpose of this invention is to reduce the psychological impact of errors that inevitably do occur, so that they have a less negative effect on the viewer/listener.

Large vocabulary speech recognition output can have error rates as high as 10%. In most cases, a set of possible candidate words is identified, and the word with the highest confidence score is displayed. In some cases, the confidence difference between the first and second candidate word is small.

This invention proposes "smearing" the differences between these 2 candidate target words, and leaving the decision of what is perceived up to the viewer. One method that has been explored to date is presenting low-confidence words in phonetic characters, so that the viewer tries to "map" this to the nearest word match that makes sense in a particular context. (Disclosure YOR8-2000-0972, Automatic Speech Recognition means.interface for transcribing and displaying speech data to improve communication and lip-reading skills of the hearing impaired.) Transforming phonetics into words, however, may be difficult for an untrained or casual user.

A novel method to "smear" differences is proposed in this invention, using handwriting. Humans can read each other's handwritings despite large disparities in the way individuals form their letters, because they bring their human intelligence and context to bear on the task. Handwritten letters, when presented in isolation, can easily be "misread" as other letters than the intended letter. In context, though, the viewer "sees" what he/she is expected to see, as a function of context. This invention proposes exploiting this ambiguity, so that speech recognition errors are not so egregiously apparent.

The system would be trained on large amounts of handwriting data. It would learn to synthesize hybrid letters that humans can interpret in multiple ways, using neural networks. Ideally, the user's own handwriting would be used as training data, since we can decipher our own handwriting better than anyone else's. When the system is not confident whether the spoken syllable was /na/ or /ma/, the output is presented in a way that can be easily interpreted...