Browse Prior Art Database

Formant Power Ratio Speech Analyzer and Encoder

IP.com Disclosure Number: IPCOM000091634D
Original Publication Date: 1968-Apr-01
Included in the Prior Art Database: 2005-Mar-05
Document File: 3 page(s) / 89K

Publishing Venue

IBM

Related People

Clapper, GL: AUTHOR

Abstract

This system is for analyzing and encoding speech signals using formant power ratios. The analyzer, drawing 1, accepts a speech signal from microphone M and amplifies it in preamplifier SPA. The amplified signal is applied to four or more broadband frequency selectors B1...BN spaced logarithmically across the frequency spectrum and an All Pass network. Each channel thus created includes rectifier R and squaring circuit A. The bandpass characteristics of the channels B1...B4 are shown in graph A, a logarithmic plot of frequency and power ratio.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 46% of the total text.

Page 1 of 3

Formant Power Ratio Speech Analyzer and Encoder

This system is for analyzing and encoding speech signals using formant power ratios. The analyzer, drawing 1, accepts a speech signal from microphone M and amplifies it in preamplifier SPA.

The amplified signal is applied to four or more broadband frequency selectors B1...BN spaced logarithmically across the frequency spectrum and an All Pass network. Each channel thus created includes rectifier R and squaring circuit A. The bandpass characteristics of the channels B1...B4 are shown in graph A, a logarithmic plot of frequency and power ratio.

The All Pass channel includes a voltage reference divider network which provides output voltages indicative of 3/4, 1/2 and 1/4 of total power in the received signal. In addition, it provides N threshold signals. These are adjusted to cut off the tails of the selector characteristics, graph A, at the peak of the neighboring selector channel characteristic. The thresholds are relative to total power. Regardless of the absolute value, when the power ratio of the channel power to total power is below the threshold, no output is developed.

Each channel includes four comparators for comparing the channel output with 3/4, 1/2 and 1/4 of the total power and with the threshold value. The comparator outputs are applied to a plurality of And-Inverter's AI to logically determine the location of formants in the energy spectrum. The outputs of the AI's can be utilized in any conventional manner. That is, they can be periodically sampled and inserted in a conventional storage matrix for processing.

The table below graph A gives the output codes generated for bands 1...4 for a single formant passing through the spectrum. Concurrent formants produce different codes. These are interpreted by the processor to determine the nature of the input. Graph B shows the number of narrow-band filters required and their characteristics for deriving the same information. The filters would have to be high-Q and would of narrow-band filters required and their characteristics for deriving the same information. The filters would have to be high-Q and would be slow to respond to frequency changes and would be subject to ringing.

A four-band system with four power ratio levels per band provides sixteen bits per unit time. If this is examined in twenty time slots, 320 bits of storage are required to store the data. This storage requirement can be reduced to about 32 by the compressed encoding method shown in drawing B.

Levels L1...L4 from each channel are applied to a power ratio maxima detector PMRD which provides an output for each band whenever a maximum for the band is detected. That is, an indication in the form of a pulse is provided when the power ratio in the band reaches a maximum. Transient detectors TD are each connected to a sample gate and to change pulse generator CPG which controls ring drive generator RG and sample pulse generator SG. RG responds to the beginning of...