Browse Prior Art Database

Wide-band Extender for Speech/Audio Codec

IP.com Disclosure Number: IPCOM000124279D
Original Publication Date: 2005-May-20
Included in the Prior Art Database: 2005-May-20
Document File: 2 page(s) / 251K

Publishing Venue

Siemens

Related People

Juergen Carstens: CONTACT

Abstract

Up to now, the telephone bandwidth of speech signals is limited to the frequency range of 300 Hz - 3.4 kHz with a sampling frequency of 8 kHz. These values are reasonable for the understandability of a speaker’s voice. Listening experiments have shown that an extended bandwidth of the speech signal to 50 Hz - 7 kHz significantly improves intelligibility and makes phone conversations more comfortable. This extension implies an upsampling of the signal to 16 kHz with the need of a higher data transmission rate. To reach the enhanced speech quality without transmitting more information, many companies study an artificial bandwidth extension (BWE) algorithm of telephone speech signals. The system is placed on the receiver side and insures total compatibility towards the actual limited speech signal for non-equipped terminals (network and receivers). The algorithm is based on the correlation and mutual information existing between the two parts of the human speech signal, the narrowband signal (300 - 3400 Hz) and the extension band signal (3400 - 7000 Hz). A version of this artificial bandwidth extension algorithm has been developed. The obtained listening results are not fully satisfying. This is due to the incorrect upper frequency estimation of fricative sounds ([s], [f], [sh], [th], etc.) whose spectral information is mostly above 3 kHz, i.e. higher than telephone speech signal cut-off frequency. Therefore, a more exact high band estimation of a fricative sound needs first an accurate recognition of the class of this particular critical sound.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

S

Wide-band Extender for Speech/Audio Codec

Idea: Cyril de Chantérac, DE-Munich; Dr. Tim Fingscheidt, DE-Munich; Dr. Hervé Taddei, DE-

Munich

Up to now, the telephone bandwidth of speech signals is limited to the frequency range of 300 Hz - 3.4 kHz with a sampling frequency of 8 kHz. These values are reasonable for the understandability of a speaker's voice. Listening experiments have shown that an extended bandwidth of the speech signal to 50 Hz - 7 kHz significantly improves intelligibility and makes phone conversations more comfortable. This extension implies an upsampling of the signal to 16 kHz with the need of a higher data transmission rate.

To reach the enhanced speech quality without transmitting more information, many companies study an artificial bandwidth extension (BWE) algorithm of telephone speech signals. The system is placed on the receiver side and insures total compatibility towards the actual limited speech signal for non- equipped terminals (network and receivers). The algorithm is based on the correlation and mutual information existing between the two parts of the human speech signal, the narrowband signal (300 - 3400 Hz) and the extension band signal (3400 - 7000 Hz).

A version of this artificial bandwidth extension algorithm has been developed. The obtained listening results are not fully satisfying. This is due to the incorrect upper frequency estimation of fricative sounds ([s], [f], [sh], [th], etc.) whose spectral information is mostly above 3 kHz, i.e. higher than telephone speech signal cut-off frequency.

Therefore, a more exact high band estimation of a fricative sound needs first an accurate recognition of the class of this particular critical sound.

A new fricative detection and classification method incorporated in the artificial bandwidth extension algorithm has been developed. This fricative recognition algorithm works on a frame-by-frame basis with a narrow band signal (300 - 3400 Hz) as input. Then, it is particularly adapted for telephone speech enhancement applications....