Browse Prior Art Database

Syllabic Speech Recognition for Real-time Phonetic Subtitling for the Deaf

IP.com Disclosure Number: IPCOM000013790D
Original Publication Date: 2000-May-01
Included in the Prior Art Database: 2003-Jun-18
Document File: 5 page(s) / 73K

Publishing Venue

IBM

Abstract

Disclosed is a method of using an off-the-shelf speech recognition system for real-time phonetic subtitling, as an assist to other methods of speech perception (lip reading, residual hearing) for deaf people. The novelty of this approach is to reuse as much of a commercial speech recognition system as possible, instead of developing an ad-hoc system specific to this problem. The modifications pertain to recognizing syllables instead of words, which enables real-time decoding of unlimited vocabulary speech.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 32% of the total text.

Page 1 of 5

Syllabic Speech Recognition for Real-time Phonetic Subtitling for the Deaf

  Disclosed is a method of using an off-the-shelf speech recognition system for real-time phonetic subtitling, as an assist to other methods of speech perception (lip reading, residual hearing) for deaf people. The novelty of this approach is to reuse as much of a commercial speech recognition system as possible, instead of developing an ad-hoc system specific to this problem. The modifications pertain to recognizing syllables instead of words, which enables real-time decoding of unlimited vocabulary speech.

1 Background

  Many deaf (or severely hearing-impaired) people rely heavily on lip-reading. However, many sounds look alike on the lips (e.g. /p/, /b/ and /m/). It is estimated that lip-reading conveys only about 30% of the speech information. Deaf people have to draw on other knowledge (like contextual or semantic knowledge) to distinguish words that are visual look-alikes, like "party", "Marty", "bar tea", etc... This makes lip-reading a very tiring and somewhat inefficient exercise.

  Some manual methods practiced by the speaker (like Cued Speech [Cornett, 1967]) can complement the information conveyed on the lips, and have proven very successful. Yet they require training on the part of the people wanting to address deaf people, thereby limiting the circle of people able to communicate with them.

  Therefore, automatic means of providing supplemental information to lip-reading have been researched [Cornett & al., 1977]. The idea is to use speech analysis or recognition techniques to send visual information in real-time to the deaf person, who combines this with all the other information they're getting (e.g. lip-reading, residual aided hearing, ...) Various technology implementations have been used to do this analysis or recognition. Most of these approaches still present some problems:
1. Speech analysis systems that provide cues about intensity or pitch [Upton, 1968] do not relate well to the written word, and do not generally seem to be providing sufficiently adequate information to the deaf person.
2. Large vocabulary speech recognition systems, such as those used for dictation into word processors (e.g. IBM ViaVoice) cannot be used directly for this purpose, basically for two reasons:- They have a limited vocabulary (some tens or hundreds of

1

Page 2 of 5

thousands of words), which is adequate for producing somewhat formal written text, but does not cover well the language used in informal day-to-day situations or teaching situations.- The language modeling techniques they use to produce the best decoded text uses a search algorithm that introduces a varying lag of typically a few seconds between the speech production and the text decoding (it is sometimes necessary to wait for the end of a sentence before firming up hypotheses about proper word spellings at the beginning of that sentence).These two design points of dictation systems make them unsuitable...