Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Automatic Utterance Isolation Using Normalized Energy

IP.com Disclosure Number: IPCOM000089424D
Original Publication Date: 1977-Oct-01
Included in the Prior Art Database: 2005-Mar-05
Document File: 3 page(s) / 95K

Publishing Venue

IBM

Related People

Das, SK: AUTHOR [+4]

Abstract

A method is described for applying the automatic amplitude normalization of speech to the problem of determining the beginning and end of an isolated utterance for discrete utterance recognition. A flow chart of the steps involved in this method is shown in Fig. 1.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 55% of the total text.

Page 1 of 3

Automatic Utterance Isolation Using Normalized Energy

A method is described for applying the automatic amplitude normalization of speech to the problem of determining the beginning and end of an isolated utterance for discrete utterance recognition. A flow chart of the steps involved in this method is shown in Fig. 1.

Given a time window of length T seconds which is sampled and digitized via pulse code modulation at a rate F/sec. into N points, a log energy sequence (step 1) can be obtained from the data x(1), where i = 0,1,2... N - 1,

(Image Omitted)

where M points are included in the calculation of each element, and the initial points of intervals are separated by L points. In a prior publication [*] a method for obtaining a normalized energy range is presented. Here, a histogram of the sequence E(n) was found to be bimodal, and this fact implied a means by which a normalized range could be obtained.

The discrete utterance discribed herein is ready-made for a very similar treatment (step 2). The long, low-energy intervals cause a very high peak to occur at the bottom end of the histogram. The top end, however, is more spread out than in the continuous speech case. Here, the method which is currently working sets 0 on the normalized scale to the large silence peak of the histogram and 32, the top of the scale, to the highest energy value. This is shown in the Energy Normalization Histogram in Fig. 2.

After a linear scaling, but no limiting, of the energies E(n) to the normalized range [0-32], utterance isolation may proceed. First, a "long-silence" table is derived by the use o...