Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Voice Activity Detector

IP.com Disclosure Number: IPCOM000050883D
Original Publication Date: 1982-Dec-01
Included in the Prior Art Database: 2005-Feb-10
Document File: 3 page(s) / 61K

Publishing Venue

IBM

Related People

Irvin, DR: AUTHOR

Abstract

Voice activity detection is an essential function in efficient speech processing systems for effective use of channel band width. In a large group of modern-day speech coders, the frequency domain is employed and information concerning the speech signal spectrum is generated during the operation of such devices. Another current device is the time domain voice detector which employs a simple comparison between the level of incoming speech and a fixed reference level in order to determine whether speech is present or absent. More advanced detectors adapt the reference level to match the characteristics of incoming speech, but the time waveform is still the source of the information on which the speech or no speech/noise decision is made.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 3

Voice Activity Detector

Voice activity detection is an essential function in efficient speech processing systems for effective use of channel band width. In a large group of modern-day speech coders, the frequency domain is employed and information concerning the speech signal spectrum is generated during the operation of such devices. Another current device is the time domain voice detector which employs a simple comparison between the level of incoming speech and a fixed reference level in order to determine whether speech is present or absent. More advanced detectors adapt the reference level to match the characteristics of incoming speech, but the time waveform is still the source of the information on which the speech or no speech/noise decision is made.

An improvement can be achieved, as proposed in this article, by combining the features of the two techniques. A state of the art adaptive level time domain voice activity detector is included in the proposed system, but its details are not shown since the structure of the adaptive level voice detector is not a part of this description.

The proposed system also uses a frequency domain coder which is similarly not described as such coders are well known and the structure thereof does not form a specific part of this description. However, normal operation of frequency domain speech encoders classifies incoming signal frames into either voiced speech or noise/unvoiced speech categories by operation. Thus, the output of frequency domain speech detectors provides an indication of either voiced speech or of noise/unvoiced speech directly.

In the present description, two comparator levels are derived from the time domain adaptive speech level detector network by multiplying the output at the adaptive speech detection threshold estimate by constants K1 and K2 to establish two reference threshold levels.

Turning to Fig. 1, incoming speech is present on line 1 in the form of analog signals and including noise and/or unvoiced speech components. An adaptive time domain voice level threshold detector circuit 2 extracts information from the incoming signal stream that indicates the relative level of noise and adapts a threshold for speech level detection above that threshold at its output on lines 3.

The incoming level of the signal is monitored in the amplifier or level-sampling latch 4 whose output is provided on line 5 for later use. The output of the adaptive variable threshold detector circuit 2 is supplied on lines 3 to a pair of multipliers 6 where two different constants K1 and K2 are multiplied to generate two threshold levels 1 and 2, respectively, as shown. These outputs, together with the output from the signal level detector 4, are supplied to a logic circuit 7. Frequency domain speech encoder 8 may also be positioned in the incoming signal path 1. The frequency domain speech encoder is well known in the art and needs no further description....