Browse Prior Art Database

Vector Quantization Procedure for Speech Recognition Systems Using Discrete Parameter Phoneme Based Markov Word Models

IP.com Disclosure Number: IPCOM000037340D
Original Publication Date: 1989-Dec-01
Included in the Prior Art Database: 2005-Jan-29
Document File: 3 page(s) / 20K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+4]

Abstract

In some speech recognition algorithms, Markov models are used which are based on phonetics. In others, Markov models are used which are based on standard labels (called fenemes) generatable by an acoustic processor. Depending on the model used, the model outputs are either fenemes, which correspond to standard prototype vectors, generated by the acoustic processor, or labels obtained from spliced vectors derived from the standard vectors.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 55% of the total text.

Page 1 of 3

Vector Quantization Procedure for Speech Recognition Systems Using Discrete Parameter Phoneme Based Markov Word Models

In some speech recognition algorithms, Markov models are used which are based on phonetics. In others, Markov models are used which are based on standard labels (called fenemes) generatable by an acoustic processor. Depending on the model used, the model outputs are either fenemes, which correspond to standard prototype vectors, generated by the acoustic processor, or labels obtained from spliced vectors derived from the standard vectors.

The steps involved in generating spliced vectors, labelling the spliced vectors, and applying the new labels to data are set forth below. In performing the steps, it is assumed that training data for a number of speakers has been collected and aligned (by Viterbi alignment) against existing word baseforms. Preferably, each existing baseform is a sequence of Markov models based on fenemes.

Step 1. For each standard vector in the training data, create a new "spliced" parameter vector by concatenating the original standard vector with K preceding vectors, and K succeeding vectors.

Step 2. Using the Viterbi alignment, for each time frame tag the corresponding spliced vector with the name of the phone aligned with the time frame. Thus, the spliced vectors are allocated to classes: one for each phone in the phone alphabet.

Step 3. Compute the P best mutually uncorrelated eigenvectors for discriminating between the classes formed in Step 2 using the spliced parameter vectors. That is, consider N spliced vectors drawn from M classes. Let xik denote the ith element of the kth vector. Then, the sample covariance matrix of the data S is defined as 1 N

Sij = - S (xik - xi) (xjk - xj)

N k=1

where xi denotes the sample mean of the ith element. Let ni denote the number of vectors in the ith class. Then the s...