Browse Prior Art Database

On Improving The Reliabilty Of Cepstral Pitch Estimation

IP.com Disclosure Number: IPCOM000148056D
Original Publication Date: 1979-Feb-28
Included in the Prior Art Database: 2007-Mar-28
Document File: 28 page(s) / 1M

Publishing Venue

Software Patent Institute

Related People

Yegnanarayana, B.: AUTHOR [+3]

Abstract

B. Yegnanarayana and T. V. Ananthapadmanabha

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 11% of the total text.

Page 1 of 28

On Improving The Reliabilty Of Cepstral Pitch Estimation

B. Yegnanarayana and T. V. Ananthapadmanabha

Department of Computer Science
Carnegie-Mellon University

Pittsburgh, PA 15213

February 1979

This research was sponsored by the Defence Advanced Research Projects Agency (DOD), ARPA Order NO. 3597, and monitored by Air Force Laboratory under Contract F336 15-78-Cl15


1.

The views and the conciusions contained in this document are those of the authors and should not be interpreted as the official policies, either expressed orimplied, of the Defence Advanced Research Projects Agency or the U.S. Government.

1

[This page contains 1 picture or other non-text object]

Page 2 of 28

[This page contains 1 picture or other non-text object]

Page 3 of 28

ABSTRACT

Identification of relatively high SNR regions in the short-time spectrum of a speech segment is very useful in speech processing applications. Such regions usually occur around the peaks in the spectral envelope. In this we propose a method for determining such

a regions automatically for a given speech segment. The method is based on a recently developed technique for pole-zero decomposition of speech spectra. It is shown that by selectively processing the high SNR regions of the spectrum, an unambiguous pitch peak in the high quefrency portion of the cepstrum can be obtained. The processing involves computation of Hiibert envelope of the seiectively filtered cepstrum. Several examples of speech segments are considered to illustrate the improvement provided by the proposed met hod.

[This page contains 1 picture or other non-text object]

Page 4 of 28

[This page contains 1 picture or other non-text object]

Page 5 of 28

Speech is the output of a time varying vocal tract syitem excited by a time varying excitation. Due to nonstationary nature of the speech signal, speech analysis is usually performed on short segments (10-40 msec) o'f speech. Signal to Noise ratio (SNR) of speech signal is different for different segments of the data. Further, for a given segment, the SNR
is a function of frequency in the short-time spectrum. For additive white noise, it is reasonable to assume that SNR is relatively higher over the regions corresponding to peaks
in the envelope of the short-time spectrum. Identification of such high SNR regions in the signal spectrum would be very useful in accurate analysis of speech, especially in obtaining a reliable estimate of voice pitch So far there has been no convenient method available tor .

automatically identifying such regions in the short-time spectrum. Recently it was shown that the derivative of phase spectrum (DPS) of the minimum phase correspondent of a given signal provides directly the inforkat ion .corresponding to peaks and valleys of the spectral envelope1. The objective of this paper is to show the use of such an information in i...