Browse Prior Art Database

Word Recognition System

IP.com Disclosure Number: IPCOM000092263D
Original Publication Date: 1968-Nov-01
Included in the Prior Art Database: 2005-Mar-05
Document File: 3 page(s) / 70K

Publishing Venue

IBM

Related People

Clapper, GL: AUTHOR

Abstract

Some spoken word recognition systems are based on an analysis of the speech in terms of frequency, time, and intensity. In many systems, the amplitude or intensity is used-directly or with some normalization over the frequency spectrum. In other instances, a squaring function or logarithmic function is introduced. Others use a threshold and sampling method. Some sample the intensity and encode it. Others compare relative amplitudes. The latter is used to locate local maxima for formant tracking. One system uses formant transitions as a speech measure. Many systems produce good to excellent results for single speakers. However, recognition scores invariably drop off when attempts are made to recognize words spoken by several different speakers.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 50% of the total text.

Page 1 of 3

Word Recognition System

Some spoken word recognition systems are based on an analysis of the speech in terms of frequency, time, and intensity. In many systems, the amplitude or intensity is used-directly or with some normalization over the frequency spectrum. In other instances, a squaring function or logarithmic function is introduced. Others use a threshold and sampling method. Some sample the intensity and encode it. Others compare relative amplitudes. The latter is used to locate local maxima for formant tracking. One system uses formant transitions as a speech measure. Many systems produce good to excellent results for single speakers. However, recognition scores invariably drop off when attempts are made to recognize words spoken by several different speakers.

This word recognition system utilizes analog differentiators which use the rate of change of amplitude with frequency and time rather than the absolute value. This function is a partial derivative with respect to time, frequency being held constant so that the required conditions are met by taking the derivative of the rectified output from a relatively narrow band, fixed frequency filter.

A high-gain preamplifier receives the speech signals and produces the complex signal input to nine selectors. Narrow-band selectors F1...F7 cover the formant range of the speech spectrum from 280HZ to 4000HZ. Broad-band selector F8 covers the voice energy range from approximately 80 to 280HZ. A broad-band fricative selector covering the high-frequency noise range from 4000HZ to 10,000HZ is used to detect the start of fricative sounds in such words as four and five. The outputs of F1...F7 are rectified in rectifier units R1...R9 and a DC voltage is produced proportional to the selector outputs. Attenuators at the selector inputs are adjusted to normalize the amplitude of the rectified outputs. These outputs go to word recognition system analog differentiators AD1...AD8. The outputs of the rectifiers are applied to an Or which provides a start signal for the matrix ring driving circuit.

Analog differentiators AD1...AD8 analyze the slowly changing waveforms from the associated selector rectifiers and pass on a signal to the word pattern storage matrix as a function of thresholds appearing at various points on a resistor network. This is connected to a constant-current sink CCS. The resistor network and CCS operate together to provide the proper threshold for the analog differentiator so that a certain fixed number of the eight lines to the matrix are energized at any one time. A maximum of three out of eight provides excellent results. During a glide when one or two frequency bands change during a time interval, a total of four or five can register in any particular time slot because of the change. At times only one or two selector outputs are operative.

In this system, the fricative can start the time base generator. However, the fricative is suppressed and does not enter the storage...