Browse Prior Art Database

Method for Efficient Customization of a Speech Recognition System

IP.com Disclosure Number: IPCOM000110275D
Original Publication Date: 1992-Nov-01
Included in the Prior Art Database: 2005-Mar-25
Document File: 2 page(s) / 95K

Publishing Venue

IBM

Related People

Merialdo, B: AUTHOR

Abstract

Herein described is a method for efficient customization of a speech recognition system. Current speech recognition prototypes have a vocabulary limitation of 20,000 words. This is sufficient to cover properly a specific application, for example radiology reports, or press agency news, etc., but this is not enough to build a single general system.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Method for Efficient Customization of a Speech Recognition System

       Herein described is a method for efficient customization
of a speech recognition system.  Current speech recognition
prototypes have a vocabulary limitation of 20,000 words.  This is
sufficient to cover properly a specific application, for example
radiology reports, or press agency news, etc., but this is not enough
to build a single general system.

      Therefore, it will be necessary to build different systems for
different domains: health, insurance, legal, bank, etc.

      With the current technology, building a system for a particular
domain requires:
1.  To define the words of the vocabulary (spelling).
2.  To define how they are pronounced (phonetic baseforms).
3.  To define how they can be used to build sentences (language
model).

      Phase 1 is realized by selecting the most frequent words that
appear in some (large) amount of text which is typical of the
application.

      Phase 2 can be realized by automatic phonetization programs.
However, due to the specific pronunciations that occur for proper
nouns, foreign words or technical terms, the phonetic baseforms have
to be checked manually by a specialist, at least for certain type of
words.

      Phase 3 is realized by making statistics of word-sequence
frequency in some (large) amount of text.

      In these three phases, there are two parts where human
intervention is required.
1.  Large amounts of text for a domain come in various formats,
depending on the database or text processing software.  This format
has to be analyzed, for example, to remove index information, or
formatting tags, so that the text itself is put into a standardized
format usable by the other procedures.
2.  The phonetization has to be checked by a specialist of the domain
who knows how the words are pronounced.

      The problems associated with these interventions are:
- they are expensive,
- special skills have to be developed for format processing,
- it is difficult to estimate how much work is need...