Browse Prior Art Database

Continuous Formant Tracker

IP.com Disclosure Number: IPCOM000085583D
Original Publication Date: 1976-Apr-01
Included in the Prior Art Database: 2005-Mar-02
Document File: 3 page(s) / 57K

Publishing Venue

IBM

Related People

Baker, JK: AUTHOR [+2]

Abstract

A procedure is provided that relates to spectral decomposition of a complex function by linear predictive coding. Analysis of complex waveforms often involves a form of deconvolution, to separate the transfer function of the system producing the waveform from the excitation function. This operation is especially important in the analysis of speech waveforms, where the transfer function is determined by the shape of the vocal tract, while the excitation function is determined by such things as the fundamental frequency of the glottis.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 3

Continuous Formant Tracker

A procedure is provided that relates to spectral decomposition of a complex function by linear predictive coding. Analysis of complex waveforms often involves a form of deconvolution, to separate the transfer function of the system producing the waveform from the excitation function. This operation is especially important in the analysis of speech waveforms, where the transfer function is determined by the shape of the vocal tract, while the excitation function is determined by such things as the fundamental frequency of the glottis.

The phonetic identify of a given segment of speech is determined almost entirely by the vocal tract, and the excitation function is mainly important in determining speaker characteristics and certain prosodic features. For speech recognition, therefore, it is very important to know the vocal tract transfer function.

A method for estimating the vocal track transfer function from the speech waveform which has been gaining popularity in recent years is linear predictive coding. There are two basic versions of linear predictive coding, sometimes called the "stationary" method and the "nonstationary" method, respectively. The stationary method calculates a linear predictive filter from the autocorrelation function of the speech waveform. So that the sums involved in the autocorrelation function will have a finite number of terms, it is necessary to multiply the waveform by a "window" function which is zero outside a finite interval, an operation which can only be justified if the speech waveform is modeled as a stationary stochastic process.

Multiplying the waveform by a window function is equivalent to convolving the spectrum of the original waveform with the spectrum of window function, which thus limits the frequency resolution of the stationary method of linear prediction. This frequency limitation, in turn, places a limitation as to the minimum usable width for a time window of about two pitch periods, which limits the time resolution and makes it difficult to track very fast changes in the vocal tract shape, and makes it impossible to analyze events which are related to specific subsets of the pitch cycle.

The nonstationary method of linear prediction estimates the transfer function from the so-called covariance function of the waveform, for which it is not necessary to multiply the waveform by a window function. However, for each estimate of the linear predictive filter this method requires computation of and inversion of the covariance matrix. In addition, in order to find the frequencies of the vocal tract resonances (or formants), it is necessary (with either the stationary or the nonstationary method) either to compute the discrete Fourier transform of the impulse response of the linear predictive filter or to find the roots of the polynomial which is its z-transform. The amount of computation involved makes it impr...