Browse Prior Art Database

Specialized Language Models for Speech Recognition

IP.com Disclosure Number: IPCOM000114846D
Original Publication Date: 1995-Feb-01
Included in the Prior Art Database: 2005-Mar-30
Document File: 4 page(s) / 140K

Publishing Venue

IBM

Related People

Cohen, PS: AUTHOR [+7]

Abstract

An indication of the degree of difficulty of a speech recognition task is its perplexity, which is a measure of the effective average branching factor of the generic model used by standard speech-recognition or speech-understanding systems (1,2).

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 41% of the total text.

Specialized Language Models for Speech Recognition

      An indication of the degree of difficulty of a speech
recognition task is its perplexity, which is a measure of the
effective average branching factor of the generic model used by
standard speech-recognition or speech-understanding systems (1,2).

      Composite or competing language models using data gathered on
the actual, time-varying use of natural language are developed to
reduce the perplexity of a specific speech recognition task.  These
models are based on statistical data collected on the language of an
individual or group of individuals.  This data is applied to
traditional linguistic rules or to finite-state models.  The
likelihood of use of specific morphemes, words, and sentences is
highly dependent on the recent use thereof and on the topic and
content of a particular interaction.  Thus, the generic language
model used by a particular system, based on a combination of
linguistics and statistics, is supplemented by supplying a series of
models tailored for each user of the system.  While the models may be
static, being based on the body of system usage of an individual or
group, they are preferably dynamic, being based particularly on the
most recent system usage of an individual.  This data may be
collected and dynamically updated by monitoring a network, using, for
example, Lempel-Ziv techniques (3,4,5).  If such a model is not
available for a particular user, the system backs-off to use data
developed for the department, the site, the company, the type of
task, and finally to the generic language model, with other specific
models being interposed as needed.  Models from various sources, such
as prior sessions, are preferably saved to compete with current
language models.  If a user returns to work on a previous project, a
prior language model may be selected.  If a user corrects his
diction, the relative weighting of models can be changed.

      Lempel-Ziv (LZ) based algorithms have been used to track
redundancy at a character level in an adaptive, constantly-changing
manner, and to track redundancy at a word or string level.  In other
implementations, a frozen LZ table is used.  An LZ based algorithm
can produce long n-grams with a high probability of localized use.
Such algorithms are used for simultaneously tracking words used
frequently over long periods of time, words used infrequently, but
recently, n-grams longer than three, and high-probability n-grams.

      In a workstation having a terminal emulator, software is used
to monitor and to record data transmitted to or from a workstation,
predicting which words or language structures will be most likely
used.  One cache is built for data entered by the user, trapping, for
example, the keyboard buffer.  A second cache receives data
transmitted to the user, and a third cache receives data sent from
the user's system to the network server.  In this application,
terminal emulators hav...