Browse Prior Art Database

Dynamic Adjustment of Silence/Speech Threshold in Varying Noise Conditions

IP.com Disclosure Number: IPCOM000112685D
Original Publication Date: 1994-Jun-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 2 page(s) / 64K

Publishing Venue

IBM

Related People

Bahl, LR: AUTHOR [+5]

Abstract

An algorithm is disclosed for the dynamic adjustment of a silence/speech energy threshold in response to changing noise conditions.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 53% of the total text.

Dynamic Adjustment of Silence/Speech Threshold in Varying Noise Conditions

      An algorithm is disclosed for the dynamic adjustment of a
silence/speech energy threshold in response to changing noise
conditions.

      If the background noise level is relatively constant, a fixed
threshold could be set.  However, if the noise level is variable, a
method is needed for dynamically adjusting the threshold in response
to changing noise conditions.

      The basic idea behind the algorithm is to construct a histogram
of frame energies, and then locate the first peak in the histogram.
This peak is a reliable indicator of the current noise level.  In
order to track the varying level of noise, the histogram is
constructed periodically and the silence/speech energy threshold is
adjusted, if necessary.

      Let e(t) denote the energy of the speech frame at time t.

      Let [a(0), a(1), ...  a(M)] be an array of integers, in which
the histogram of the input frame energies is stored.  Thus a(i) will
contain the number of frames having energy i.  If the energy is
computed in dB units, rounded to the nearest integer, then an array
of size 100 will easily suffice.

      Let T denote the current value of the silence/speech threshold.

STEP 1.  INITIALIZATION
 o   Initialize the histogram, i.e., set a(i) = 0 for all i.
 o   Initialize the frame count f to be 0.

STEP 2.  UPDATING

When a new frame comes in,
 o   Calculate its energy e(t) and update the histogram; i.e., if
e(t)
    = k, then increment a(k) by 1.
 o   Also, increment the frame count f by 1.

If the frame count f is equal to a pre-specified value F, then go to
STEP 3; otherwise, go to STEP 2.

STEP 3.  RE-ESTIMATING THE SPEECH/SILENCE THRESHOLD
 o   Locate th...