Browse Prior Art Database

Determining Good Speech and Silence Prototypes Using Speaker-Independent Prototypes

IP.com Disclosure Number: IPCOM000114755D
Original Publication Date: 1995-Jan-01
Included in the Prior Art Database: 2005-Mar-29
Document File: 2 page(s) / 63K

Publishing Venue

IBM

Related People

Epstein, M: AUTHOR [+2]

Abstract

The current noise-adaptation acoustic labeler in the Tangora used a set of k-means Euclidean prototypes in order to label each frame as speech or silence. In order for this to work, one must select prototypes that are known to be speech or silence. This is customarily done by picking speech and silence seeds, and then performing an unsupervised clustering (*) to improve the quality of the prototypes. This invention improves upon this this by supervising the selection using speaker-independent prototypes and statistics.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Determining Good Speech and Silence Prototypes Using Speaker-Independent
Prototypes

      The current noise-adaptation acoustic labeler in the Tangora
used a set of k-means Euclidean prototypes in order to label each
frame as speech or silence.  In order for this to work, one must
select prototypes that are known to be speech or silence.  This is
customarily done by picking speech and silence seeds, and then
performing an unsupervised clustering (*) to improve the quality of
the prototypes.  This invention improves upon this this by
supervising
the selection using speaker-independent prototypes and statistics.

      The Euclidean clustering algorithm works well if the seeds are
selected in "safe" speech and silence regions.  However, if a seed
borders being speech or silence, then this could lead to a speech
prototype that models silence well, or vice versa.  The standard seed
selection algorithm uses the energy of frames to classify the frame
as speech or silence.  If the energy is below a threshold, the frame
is determined to be silence.  Above, it is then speech.  The standard
seed selection algorithm is as follows:
  1.  Compute the energy levels of every frame in a training session.
  2.  Starting with the minimum and maximum energies as seeds,
perform
       20 iterations of k-means clustering (using just 2 clusters).
  3.  After the last iteration, pick the midpoint between the 2
       clusters as the boundary energy value between speech and
silence.

      While in general this picks good speech and silence seeds, in
noisy or low volume environments, many frames can easily be
misclassifi...