Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Adaptive Amplitude Normalization of Speech by Histogram Matching

IP.com Disclosure Number: IPCOM000062802D
Original Publication Date: 1986-Dec-01
Included in the Prior Art Database: 2005-Mar-09
Document File: 1 page(s) / 12K

Publishing Venue

IBM

Related People

Das, SK: AUTHOR [+2]

Abstract

According to this invention, a sample histogram of overall amplitude level, called the master histogram, is identified and saved at an early stage of processing speech data. All subsequent processing is carried out by utilizing the master histogram as a reference template. By matching speech against master histogram, speech amplitude changes can be accounted for.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 64% of the total text.

Page 1 of 1

Adaptive Amplitude Normalization of Speech by Histogram Matching

According to this invention, a sample histogram of overall amplitude level, called the master histogram, is identified and saved at an early stage of processing speech data. All subsequent processing is carried out by utilizing the master histogram as a reference template. By matching speech against master histogram, speech amplitude changes can be accounted for.

The figure is a block diagram of the overall procedure. Speech spectrum generator 102 provides a frequency domain representation of speech every centisecond. This is obtained after some preliminary processing where the Fourier transform of the speech signal is magnitude squared, blocked into several bands and a noise spectrum estimate is subtracted out. The master histogram is computed with element 104 by examining approximately the first 150 seconds of speech input. The master histogram is a record of overall amplitude levels attained by the speech signal during this period. Typically it ranges from about 50 to 110 dB. The master histogram is normalized by dividing by the total number of input samples, and the percentage values are saved.

During subsequent processing of speech data, the master histogram serves as a reference template for matching and level adjustment. An attempt is made to perform this matching and adjustment every 10 seconds. However, if a speech activity detector 106 indicates the presence of speech at that time, the pro...