Browse Prior Art Database

Infrared Neuro-Analyzer Microphone (IRNA Microphone)

IP.com Disclosure Number: IPCOM000113319D
Original Publication Date: 1994-Aug-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 4 page(s) / 129K

Publishing Venue

IBM

Related People

Bauer, M: AUTHOR [+3]

Abstract

Described is an opto-acoustic microphone, which correlates the infrared signal of the mouth movement with the acoustic stereo signal of the spoken language by means of a neuro-analyzer in such a way that high recognition rates can still be achieved (in machine speech recognition) despite high background noise.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 46% of the total text.

Infrared Neuro-Analyzer Microphone (IRNA Microphone)

      Described is an opto-acoustic microphone, which correlates the
infrared signal of the mouth movement with the acoustic stereo signal
of the spoken language by means of a neuro-analyzer in such a way
that high recognition rates can still be achieved (in machine speech
recognition) despite high background noise.

      A new opto-acoustical microphone is proposed which requires
only the recording of a one-dimensional integrated optical IR signal
of the mouth area of the speaker.

      This signal is correlated with the acoustic stereo signal by
means of a neuro-analyzer analyzer and processed so as to suppress
the background noise, the correlated signal data being simultaneously
considerably compressed (Fig. 1).

      The acoustic stereo signals may be captured by conventional
microphones.  The two one-dimensional signals are phase-shifted by
means of a learnt filter such that the mean correlation is maximized.
For the purposes of correlating the stereo signals, a buffer memory
chain is implemented on the chip to provide a time lag on the
microphone signals, the output data from which act as input for the
neuron.

      The movement of the mouth is focused on the IR diode by means
of optical lenses (Fresnel rings) to capture the IR signal.  An
aperture plate excludes the undesired IR sources in the environment.
This reduces the sensitivity to variable IR sources in the
environment (such as variations in the luminous reflection from head
movements).  A simple IR sensitive diode is adequate to detect the IR
source 'mouth' in one dimension (Fig. 2).

      Preprocessing of the speech signals is implemented through an
artificial neuron network such that by mixing the inputs the
correlations both between the stereo signals and between the IR
signal and the acoustic signal are maximized in the output signal
(Fig. 3).

      Processing is implemented on a signal linear neuron, i.e. from
a weighted sum of the acoustic and the infrared signals.

      The decisive factor here is the weighting: it is implemented
adaptively by the mean correlation of both signals.  It is known that
in a correlative learning rul for the weightings and normalized
length of the weighting vector processing in a linear neuron
corresponds to a Principal Component Analysis (PCA) or Karhunen-Loeve
Transformation (KLT).

      The correlation between the IR signal and the envelope of the
resultant acoustic signal leads to a corrected envelope, which
influences the acoustic signal through an amplitude controller.

      The thus corrected analogue output signal can be fed directly
into the speech recognition system; it may however also first be
digitized on the chip and passed on to the processing system in
digital form (through a serial interface).

      In addition to pure interference suppression, this system may
be extended by further neurons.  If these are coupled in...