Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Automatic Amplitude Normalization of Speech

IP.com Disclosure Number: IPCOM000080615D
Original Publication Date: 1974-Jan-01
Included in the Prior Art Database: 2005-Feb-27
Document File: 2 page(s) / 52K

Publishing Venue

IBM

Related People

Cohen, PS: AUTHOR [+3]

Abstract

After obtaining speech recordings under different conditions, e.g., microphones, speakers, gain settings, it is necessary to have an analysis system normalize the overall gain characteristic out of consideration. Typically this is done by an Automatic Level Control (ALC) system. Effects of ALC, however, due to its time-varying nature, sometimes do enter analysis. It is the function of the described system to determine the speech related limits for "amplitude", which may be used to normalize the analysis system automatically without introducing any coloring upon the data, except a DC term.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 72% of the total text.

Page 1 of 2

Automatic Amplitude Normalization of Speech

After obtaining speech recordings under different conditions, e.g., microphones, speakers, gain settings, it is necessary to have an analysis system normalize the overall gain characteristic out of consideration. Typically this is done by an Automatic Level Control (ALC) system. Effects of ALC, however, due to its time-varying nature, sometimes do enter analysis. It is the function of the described system to determine the speech related limits for "amplitude", which may be used to normalize the analysis system automatically without introducing any coloring upon the data, except a DC term.

The log energy distribution for speech data, as has been experienced, is inherently bimodal in a high-quality recording system. Fig. 1 shows an actual, complex histogram taken from digital speech data. The lower peak is due to lowest energy events - silence and stops - and the upper peak is due to voiced, strong events. It is this shape which permits determination of the normalizing parameters of the maximum allowed energy and minimum allowed energy for the recording.

Fig. 1 shows a histogram of digital speech energy. The proposed system as shown in Fig. 2 obtains a histogram as in Fig. 1, by calculating log energies for segments of speech (each one approximately = 20msec.). Then, it locates the highest peak, at energy M(1). (In the example at 52db).

Then, masking a region about M(1), it finds a second peak, M(2)(here at 16db). It call...