Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Method of Determining Reference Spectra Suitable for Labeling Speech in Automatic Speech Recognition Systems

IP.com Disclosure Number: IPCOM000044411D
Original Publication Date: 1984-Dec-01
Included in the Prior Art Database: 2005-Feb-05
Document File: 2 page(s) / 14K

Publishing Venue

IBM

Related People

deSouza, PV: AUTHOR [+2]

Abstract

Using a computed average of the spectra aligned against each phone in the training data, rather than using random reference spectra as the starting point, and adjusting these reference spectra by iterative use of the clustering algorithm, increases overall system effectiveness in a continuous speech recognition system. For each spectrum in the training data the closest reference spectrum is found, and the most common correct/current classification error (expressed as a proportion of the correct phone's occurrence) is calculated. For each spectrum the nearest reference spectrum is found; for each reference spectrum all the training spectra for which that reference spectrum was closest are averaged and used as an adjusted reference spectrum replacing that reference spectrum.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 2

Method of Determining Reference Spectra Suitable for Labeling Speech in Automatic Speech Recognition Systems

Using a computed average of the spectra aligned against each phone in the training data, rather than using random reference spectra as the starting point, and adjusting these reference spectra by iterative use of the clustering algorithm, increases overall system effectiveness in a continuous speech recognition system. For each spectrum in the training data the closest reference spectrum is found, and the most common correct/current classification error (expressed as a proportion of the correct phone's occurrence) is calculated. For each spectrum the nearest reference spectrum is found; for each reference spectrum all the training spectra for which that reference spectrum was closest are averaged and used as an adjusted reference spectrum replacing that reference spectrum. In speech recognition systems using Markov word models, it is customary to compute the speech spectrum at regular intervals, typically every 10 ms, and to replace this spectral vector by a scalar label identifying which one of a set of reference spectra is most similar to the observed spectrum. The resulting sequence of labels is input to the speech recognition system instead of the sequence of spectra. As the speech recognizer sees only the label sequence, and not the original speech, it is important that the reference spectra be chosen wisely; poorly chosen spectra can result in a serious loss of information (a loss of clarity) which can lead to poor recognition accuracy. This article describes a method for selecting the reference spectra so as to retain sufficient information to permit reliable recognition. It is assumed that the number of reference spectra to be determined is specified in advance, and that it is greater than or equal to the total number of Markov phone models used in the recognition system. One method for determining reference spectra selects reference spectra at random and uses an iterative K-means clustering algorithm to adjust them. The clustering method has the disadvantage that it can give poor results if the initial random selection is poor. The method of this article does not use clustering, nor does it require an initial random selection of reference spectra, and consequently it is a more reliable method. Assuming the existence of some training speech on which a Viterbi alignment has been performed, it is possible to classify each of the training spectra as belonging to a particular phone....