Browse Prior Art Database

Selection of Frequency Bands for Speech Recognition

IP.com Disclosure Number: IPCOM000113999D
Original Publication Date: 1994-Oct-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 2 page(s) / 61K

Publishing Venue

IBM

Related People

Das, S: AUTHOR [+2]

Abstract

In Fourier transform based approach of signal processing for speech recognition, the useful signal bandwidth is divided into a number of mel-scale frequency bands. This invention provides a general method for determining these band settings when input parameters such as the desired sampling rate, number of bands and frequency range of the signal are specified.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 53% of the total text.

Selection of Frequency Bands for Speech Recognition

      In Fourier transform based approach of signal processing for
speech recognition, the useful signal bandwidth is divided into a
number of mel-scale frequency bands.  This invention provides a
general method for determining these band settings when input
parameters such as the desired sampling rate, number of bands and
frequency range of the signal are specified.

      The mel-scale frequency band settings for speech recognition
imply approximately linear frequency spacings below 1 kHz and
logarithmic spacings above that (1,2,3).  Such settings are
conveniently derived from a hyperbolic sine function,
  f ( x ) = k sinh ( x ),
  where k is a constant.  Input parameters such as sampling rate
( s_rate ), size of the Fourier transform ( fftsize ), low frequency
limit of the signal ( lofreq ), high frequency limit of the signal
( hifreq ) and number of desired frequency bands ( nbands ) are
specified
by the user.
 Each transform point represents a frequency range of
 deltaf = s_rate / fftsize
 We calculate
 freq1 = lofreq - deltaf
 as the high frequency setting of a hypothetical pre-starter band.
 Then a quantity  loset1  is calculated
   loset1 = freq1 / k
 The corresponding argument for  sinh  is obtained as
 loset = log ( loset1 + sqrt ( loset1 * loset1 + 1.0 ) )
 Similarly,
  hiset1 = hifreq / k
 and the corresponding  sinh  argument is
  hiset = log ( hiset1 + sqrt ( hiset1 * hiset1 + 1.0 ) )
 In these equations, log represents the natural logarithm.  From
these
loset and hiset values, we compute two interpolation parameters,
 const1  and ...