Browse Prior Art Database

Improving Speech Recognition Accuracy with Multiple Phonetic Models

IP.com Disclosure Number: IPCOM000116974D
Original Publication Date: 1995-Dec-01
Included in the Prior Art Database: 2005-Mar-31
Document File: 2 page(s) / 55K

Publishing Venue

IBM

Related People

Cohen, PS: AUTHOR [+3]

Abstract

Disclosed is the use of multiple phonetic models in a speech recognition system, accounting for the typical speech differences among males and females, adults, adolescents, and pre-teens, and speakers from various parts of the United States, as well as speakers from various parts of the world with slight or strong accents.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 56% of the total text.

Improving Speech Recognition Accuracy with Multiple Phonetic Models

      Disclosed is the use of multiple phonetic models in a speech
recognition system, accounting for the typical speech differences
among males and females, adults, adolescents, and pre-teens, and
speakers from various parts of the United States, as well as speakers
from various parts of the world with slight or strong accents.

      The speech recognition system can select a phonetic model by
asking the user to pick the model he wishes to use, or by allowing
multiple models to compete as the user supplies corrections to the
output of the system, so that the model providing the most accurate
results can be chosen.  Alternately, a model coming closest to the
user's accent as determined by his answers to one or more questions
may be used.  Preferably, these approaches are combined.

      In a preferred version of a speech recognition system having a
large vocabulary, the user is asked to select among a number of
phonetic models, based on his accent or dialect, age, and sex.  He is
not asked to determine his degree (heavy or light) of accent.  After
this selection is made, the system chooses a single phonetic model
from several models close to the user's selection, choosing the model
yielding fewest errors as the user reads a short set of sample
phrases, or as the user proceeds with actual dictation.

      In a preferred version of a speech recognition system having a
constrained vocabu...