Browse Prior Art Database

Speech Recognition Method using Templates Spoken by Multiple Speakers

IP.com Disclosure Number: IPCOM000114429D
Original Publication Date: 1994-Dec-01
Included in the Prior Art Database: 2005-Mar-28
Document File: 2 page(s) / 75K

Publishing Venue

IBM

Related People

Hashimoto, Y: AUTHOR [+3]

Abstract

Disclosed is a technique for making a recognizer that has the following features: o By using 10 - 20 words uttered by a new user, it can quickly select the optimum choice among multiple reference speakers whose utterances are used to construct a reference database. o It provides a function for retraining word models by eliciting the user's utterances of unrecognized words.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Speech Recognition Method using Templates Spoken by Multiple Speakers

      Disclosed is a technique for making a recognizer that has the
following features:
  o  By using 10 - 20 words uttered by a new user, it can quickly
      select the optimum choice among multiple reference speakers
whose
      utterances are used to construct a reference database.
  o  It provides a function for retraining word models by eliciting
      the user's utterances of unrecognized words.

      Constructing a Reference Database for Recognition - Each
reference speaker utters all the words in the selected vocabulary
several times.  By using these utterances as training data, the
system constructs a reference database for each reference speaker.
The reference database consists of codebooks, baseform, word models
for preliminary word selection, and fenonic model parameters
(transition probabilities and output probabilities) for detail
matching.  Since baseforms have a strong influence on the recognition
rate, all word boundaries are manually specified.

      Constructing a Database for Reference Speaker Selection - From
all the data uttered by all reference speakers, a universal codebook
is created to realize speaker-independent word recognition.  This
codebook is used to encode each reference speaker's utterances as
label sequences.  Between 10 and 20 vocabulary words are coded and
used to select a reference speaker.  From these coded-word data, a
codebook label histogram is calculated for each wo...