Browse Prior Art Database

SPEECH UNDERSTANDING SYSTEMS (Report No. 3)

IP.com Disclosure Number: IPCOM000128803D
Original Publication Date: 1975-Dec-31
Included in the Prior Art Database: 2005-Sep-19
Document File: 24 page(s) / 70K

Publishing Venue

Software Patent Institute

Related People

W.A. Woods: AUTHOR [+3]

Abstract

One of the main problems in the accurate estimation of formants and signal energy is the variability in the pitch of an individual speaker as well as its variability across speakers. The autocorrelation method of linear prediction, which we have been using so far, has the disadvantage that it is sensitive to wide variations in pitch, due to the interaction between the analysis window and the pitch period. The covariance method does not use a window and hence does not exhibit the same degree of sensitivity to pitch variations. However, it has the disadvantage that the stability of the computed model is not assured. We are now working on a class of methods (due primarily to Itakura and Burg) which do not require windowing and yet do preserve stability. We hope to settle on one method which will prove optimal for speech analysis.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 6% of the total text.

Page 1 of 24

THIS DOCUMENT IS AN APPROXIMATE REPRESENTATION OF THE ORIGINAL.

SPEECH UNDERSTANDING SYSTEMS

Quarterly Technical Progress Report No. 3

1 May 1975 to 1 August 1975

ARPA Order No. 2904 Contract No. N00014-75-C-0533 Program Code No. 5D30 Principal Investigator: William A. Woods (617) 491-1850 x361 Name of Contractor: Scientific Officer: Bolt Beranek and Newman Inc. T. H. Lautenschlager Effective Date of Contract: Title: 30 October 1974 SPEECH UNDERSTANDING SYSTEMS Contract Expiration Date: QTPR Editor: 29 October 1975 Bonnie Nash-Webber (617) 491-1850 x227 Amount of Contract: $1,041,261 Sponsored by Advanced Research Projects Agency ARPA Order No. ~^^ This research was supported by the Advanced Research Projects Agency of the Department of Defense and was monitored by ONR under Contract No. N00014-75-C-0533. BB14 Report No. 3115 Bolt Beranek and Newman Inc.

I. PROGRESS OVERVIEWS

A. Acoustic Analysis

One of the main problems in the accurate estimation of formants and signal energy is the variability in the pitch of an individual speaker as well as its variability across speakers. The autocorrelation method of linear prediction, which we have been using so far, has the disadvantage that it is sensitive to wide variations in pitch, due to the interaction between the analysis window and the pitch period. The covariance method does not use a window and hence does not exhibit the same degree of sensitivity to pitch variations. However, it has the disadvantage that the stability of the computed model is not assured. We are now working on a class of methods (due primarily to Itakura and Burg) which do not require windowing and yet do preserve stability. We hope to settle on one method which will prove optimal for speech analysis.

B. Acoustic-Phonetic Segmentation and Labeling

This quarter we extended the first-pass segmentation process described in the last quarterly progress report Woods et al., 1975b] to the point where it produces segment lattices which are suitable for input to the word matcher, BBN Report No. 3115 Bolt Beranek and Newman Inc.

In this process, the APR component starts by applying dip detection routines to three different energy parameters to produce three sets of boundaries of different types. Dips in the parameter LEZ (smoothed low-frequency energy from 120-440 Hz) indicate likely obstruents or obstruent sequences. Dips in P7EPZ (smoothed mid-frequency energy from the preemphasized signal between 640-2800 Hz) which occur within sonorant sequences indicate nasals, back semivowels [W,L], flaps or intervocalic obstruents (e.g., [HH,V,DH,D]). Dips in HEPZ (smoothed high-frequency energy from the preemphasized signal between 3400-5000 Hz) that occur within

Bolt, Beranek & Newman, Inc. Page 1 Dec 31, 1975

Page 2 of 24

SPEECH UNDERSTANDING SYSTEMS (Report No. 3)

sonorant sequences indicate [R] or flaps and sometimes nasals, [Vd] or intervocalic obstruents. Dips in HEPZ within obstruent sequences indicate silences or weak fric...