Browse Prior Art Database

DIFFERENTIAL-BASED VOICE DETECTION ALGORITHM

IP.com Disclosure Number: IPCOM000009739D
Original Publication Date: 2000-Jan-01
Included in the Prior Art Database: 2002-Sep-16
Document File: 2 page(s) / 104K

Publishing Venue

Motorola

Related People

Mike Rices: AUTHOR [+2]

Abstract

Voice Activity Detection (VAD) is used to detect voiced and unvoiced segments of a speech signal. The output of a VAD is a voiced/unvoiced decision, which may be used to control other algorithms such as Automatic Gain Control. Many VAD algorithms are complicated, using frequency domain analysis and other computationally intensive operations. Computationally simple algorithms are needed for real-time applications. In addition, many time- based VAD algorithms compare the signal energy against a threshold to determine whether voice activity is present. The difficulty with this approach is that quiet speech might not be detected, and a noisy signal might produce a false detection. There is a need for simple algorithms that detect both loud and soft utterances.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 50% of the total text.

Page 1 of 2

M-LA Technical Developments

DIFFERENTIAL-BASED VOICE DETECTION ALGORITHM

by Mike Rices and Erik Perrins

  l The termination of speech is marked by the absence of spikes in the peak and hold signal, and a decrease in the short or medium-term energy of the signal (the derivative is negative).

  These facts are used to form the criteria for the two decisions that are made in the VAD.

  The unvoiced to voiced transition is made by monitoring the difference signal of the peak. Since these signals are discrete, the difference signal refers to the first-order difference.

  When the difference signal exceeds a detect threshold, the voiced decision is made. The third plot in Figure 1 shows the peak difference signal with the detect threshold set at 3 dE%.

  The voiced to unvoiced transition is made by monitoring both the difference signal of the peak, and a modified difference signal of the medium- term energy. The difference signal is modified so that the signal at time t is not compared with the sig- nal at time t - 1, but rather at time t - At, where & is somewhere around 0.2%.

SOLUTION In other words, the modified difference signal is the difference between the medium-term energy at The VAD problem can be divided into two deci- the present time and a quarter second ago. When sions: the transition from unvoiced to voiced, and this modified difference signal is below an undetect- the transition from voiced to unvoiced. This algo- ed threshold, the unvoiced decision is tentatively rithm relies on the fact that: made.

0 Matmnl~,l"C. mw 219 January 2fwo

PROBLEM

  Voice Activity Detection (VAD) is used to detect voiced and unvoiced segments of a speech signal. The output of a VAD is a voiced/unvoiced decision, which may be used to control other algorithms such as Automatic Gain Control. Many VAD algorithms are complicated, using frequency domain analysis and other computationally intensive operations. Computationally simple algorithms are needed for real-time applications. In...