Browse Prior Art Database

DIFFERENTIAL-BASED VOICE DETECTION ALGORITHM

IP.com Disclosure Number: IPCOM000009677D
Original Publication Date: 2000-Jan-01
Included in the Prior Art Database: 2002-Sep-10
Document File: 3 page(s) / 116K

Publishing Venue

Motorola

Related People

Mike Rices: AUTHOR [+2]

Abstract

Voice Activity Detection (VAD) is used to detect voiced and unvoiced segments of a speech signal. The output of a VAD is a voiced/unvoiced decision, which may be used to control other algorithms such as Automatic Gain Control. Many VAD algorithms are complicated, using frequency domain analysis and other computationally intensive operations. Computationally simple algorithms are needed for real-time applications.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 50% of the total text.

Page 1 of 3

0 M MO-LA Technical Developments

DIFFERENTIAL-BASED VOICE DETECTION ALGORITHM

by Mike Rices and Erik Perrins

PROBLEM

  Voice Activity Detection (VAD) is used to detect voiced and unvoiced segments of a speech signal. The output of a VAD is a voiced/unvoiced decision, which may be used to control other algorithms such as Automatic Gain Control. Many VAD algorithms are complicated, using frequency domain analysis and other computationally intensive operations. Computationally simple algorithms are needed for real-time applications.

  In addition, many time-based VAD algorithms compare the signal energy against a threshold to determine whether voice activity is present. The difficulty with this approach is that quiet speech might not be detected, and a noisy signal might pro- duce a false detection. There is a need for simple algorithms that detect both loud and soft utterances.

DEFINITION OF TERMS

  Short-term energy - energy during a brief inter- val, typically the frame length (20 ms, 33 ms etc.).

  Peak energy (peak and hold) - the peaks of the short-term energy signal, decays at a constant rate (X dB/s).

  Medium-term energy - energy during a small number of short-term intervals, typically 5 or 6 intervals.

The fmt plot in Figure 101 shows the short-term

energy for a fifteen second long speech segment during which four sentences are uttered. The dotted line in the fmt plot shows the peak and hold signal.

SOLUTION

  The VAD problem can be divided into two deci- sions: the transition from unvoiced to voiced, and the transition from voiced to unvoiced. This algo- rithm relies on the fact that:

  l The onset of speech is marked by a drastic increase in the peak energy of the signal (the deriva- tive is sharply positive).

  l Continuous speech is marked by intermittent spikes in the peak and hold signal.

  l The termination of speech is marked by the absence of spikes in the peak and hold signal, and a decrease in the short or medium-term energy of the signal (the derivative is negative).

  These facts are used to form the criteria for the two decisions that are made in the VAD.

  The unvoiced to voiced transition is made by monitoring the difference signal of the...