Browse Prior Art Database

Fast Labeling Algorithm for Speech Recognition Systems

IP.com Disclosure Number: IPCOM000037055D
Original Publication Date: 1989-Nov-01
Included in the Prior Art Database: 2005-Jan-29
Document File: 3 page(s) / 22K

Publishing Venue

IBM

Related People

Nahamoo, D: AUTHOR

Abstract

A technique is described whereby an algorithm reduces computational labeling time of discrete parameter Markov model speech recognition systems by a factor of four. The reduction is achieved by collecting relevant information during the labeling of training data.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 54% of the total text.

Page 1 of 3

Fast Labeling Algorithm for Speech Recognition Systems

A technique is described whereby an algorithm reduces computational labeling time of discrete parameter Markov model speech recognition systems by a factor of four. The reduction is achieved by collecting relevant information during the labeling of training data.

In speech recognition systems, prototypes are typically obtained from training data at the time the speaker is enrolled as a user of the recognition system. The prototype generally consists of a fully specified set of parameters that allow the evaluation of the distance between the spectral vector and the prototype, for a desired distance function. Generally, distance measures involve calculations of full quadratic forms, such as full covariance Gaussian distances. However, they are computationally very expensive to produce.

The distance of the prototypes to a given frame is calculated, through the use of a labeling algorithm, so as to find the closest prototype. From the dynamics of the human articulator, the acoustic space, specifically the one represented by the prototypes, cannot be spanned in a short time interval. Furthermore, due to linguistic and phonetic constraints, not all sounds can follow each other.

A consequence of these natural constraints is that each frame interval can be limited in the search of the prototype space. In general, the longer the past history, the better the limiting criterion can be set. However, due to the complexity of the decision-making process, the concept described herein considers only the label of the last frame.

As a background formulated within the concept, let xt, t = 1, 2, BE THE SEQUENCE OF M DIMENSIONAL SPECTRAL VECTORS TO BE LABELED BY the acoustic processor 1. The acoustic processor assigns an integer label it = j to the vector xt, if the jth prototype among the k available prototypes p1, ..., pk happens to be "closest" to the spectral vector xt . Corresponding to x1, ..., xt, ... it produces a sequence of labels i1, ..., it, ..., . Since the prototype is a fully specified set of parameters that allow the evaluation of a distance between the spectral vector and the prototype for a desired distance function, the term "closest" concerns the prototype with the smallest distance to the spectral vector.

For a simple Euclidean distance, the prototypes are represented by m dimensional vectors, called centroids, uj, and the distance measure is: m

d(xt,mj) = S (xt(i) - mj(i))2 .

i=1

For full quadratic distances [2,3], in addition to the m dimensional centroid vector, the prototype has an m by m matrix S and a normalizing shift value, a. The distance in this case is given by d(xt,mj) = a + (xt - mj)T S(xt - mj).

Since the prototypes, described above, are obtained from training data at the time that the speak...