Browse Prior Art Database

Fundamental Frequency Detector for Speech Using a Multirate Preprocessor

IP.com Disclosure Number: IPCOM000085369D
Original Publication Date: 1976-Mar-01
Included in the Prior Art Database: 2005-Mar-02
Document File: 3 page(s) / 45K

Publishing Venue

IBM

Related People

Dixon, NR: AUTHOR [+2]

Abstract

A fundamental frequency (pitch) detector is provided which uses sequential processing to estimate the fundamental frequency of a speech waveform.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 3

Fundamental Frequency Detector for Speech Using a Multirate Preprocessor

A fundamental frequency (pitch) detector is provided which uses sequential processing to estimate the fundamental frequency of a speech waveform.

A major problem for achieving automatic fundamental frequency (F(o)) each fundamental event. Most F(o) detectors determine both the occurrence and the onset of events simultaneously, making it impractical, if not impossible, to diagnose errors by subsequent processing of the speech signals.

The system shown and described herein utilizes a preprocessor (see Fig. 1) which produces signals corresponding to an event occurrence, and the following stage or stages of the occurrence are detected (see Fig. 2) to determine the onset of an event. As seen in Fig. 1, the input speech data is digital in pulse code modulated form and is chosen to have a sample rate of 20 KHz.

Three finite impulse response (FIR) filters having zero phase shift are connected in parallel to the input speech signal. The high-pass filter cuts off at about 3500 Hz and is used for indicating the frication energy of the input signal. The "MID" filter covers the range from 1KHz to 3KHz and serves to isolate the energy in the second formant of the speech signal. The low-pass filter cuts off at about 800 Hz and isolates voicing and the first formant energy. As many as 90 to 175 filter elements would be used in the preprocessor of Fig. 1.

The individual outputs of the first parallel group of filters are sent through their respective absolute value determinators, and the same low-pass filter is applied to each of the absolute value outputs. The cutoff of the low-pass filter is set so that minimal error occurs in the next stage, which is the desampling stage.

Since, in speech, most events are in the range of 5 to 20 milliseconds in length, a suitable description of the energy envelope of these events can be and has been obtained by desampling to a 2 millisecond/...