Browse Prior Art Database

An adaptive multi-microphone speech enhancement method and system

IP.com Disclosure Number: IPCOM000020624D
Original Publication Date: 2003-Dec-04
Included in the Prior Art Database: 2003-Dec-04
Document File: 6 page(s) / 240K

Publishing Venue

IBM

Abstract

The problem which is considered is that of improving the intelligibility and/or perceived quality of a speech signal of interest corrupted by additive random noise. Systems for speech enhancement have many application: in hearing aids, in mobile communication systems, as the front-end of automatic speech recognition (ASR) systems in order to increase their effectiveness, to name just a few of them. The new system uses a method which maximally reduces background noise, while improving speech intelligibility by using N synchronous speech-plus-noise recordings from an array of N microphones. The system is theoretically capable of suppressing N-1 point noise sources (along with their acoustic reflections) by using N microphones and is especially applicable in very poor signal-to-noise-ratio (SNR) conditions (below 0 dB).

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 26% of the total text.

Page 1 of 6

An adaptive multi-microphone speech enhancement method and system

The approach assumes that silence intervals of the main speaker can somehow be identified at fairly regular intervals. The approach will work well whenever the nature of the noise does not change too much between the identified silence intervals. It is based on finding finite impulse response (FIR) filters which minimize the output energy during the silent intervals, yet they do not attenuate the desired speakers voice. The main subject of the invention is the definition of a numerical scheme (main claim) which attempts to accomplish this. In addition it is shown how the silence intervals may be located in a number of practical situations. A number of scenarios are possible. If the noise is non-speech noise, and the SNR is not too low, the silent intervals of the main speaker may be automatically identified using any one of a number of voice activity detectors which have been proposed in the literature. If the SNR is very low or if the noise itself is hard to separate from the speech, it is possible to manually label some initial silent intervals. Based on these intervals a noise cancelling filter may be found. Assuming that the nature of the noise changes slowly, the output of this filter will have a reduced noise content in which case it may be possible to apply a voice activity detector to that output (secondary claim?). Finally if this is not possible, some manual labeling scheme must be used throughout. This invention is also applicable to real time noise cancellation for, say, a speech recognition device whenever a microphone array may be used as an input device to the speech recognition engine. It must be coupled with some scheme for automatically labeling silent intervals of the main speaker. For example when the noise is non-speech and its level is low enough to allow a voice activity detector to effectively locate the silence intervals.

1. Basic Algorithm Description Overview The algorithm is based upon the fact that the microphone outputs may be viewed as the sum of the signals coming from the different sources (main speaker and noise sources or main speaker and additional interfering speakers). Each of these signals is the result of the original signal having gone through the acoustic filter of the room. Our aim is to optimally estimate the signal from the main speaker, including its reflections from the room's walls, while cancelling other disturbance be it background noise or interfering speakers. The processing consists of passing each of the N input channels through a filter and to sum the outputs of the filters. Our aim is to find a set of N filters such that the sum of all the outputs of the filters cancels the noise while reconstructing the main speaker. We assume that the output of all microphones is synchronized. and that the noise sources are independent of the main speaker. The role of the basic algorithm is to keep generating updated versions...