Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Dual Mode High Quality Speech Encoder

IP.com Disclosure Number: IPCOM000086792D
Original Publication Date: 1976-Oct-01
Included in the Prior Art Database: 2005-Mar-03
Document File: 3 page(s) / 40K

Publishing Venue

IBM

Related People

Hopner, E: AUTHOR

Abstract

This circuit arrangement is an efficient digital speech encoder which provides 32K bits/sec. of internationally acceptable quality speech, when the speech frequency spectrum is limited to less than 4 KHz. It also provides approximately 8 KHz of effective bandwidth, if the speech signal is derived from a high quality source. Therefore, the number of digital speech channels, as compared to acceptable standards, is doubled while providing effectively twice the audio speech band. In effect, the digital speech encoding efficacy is quadrupled.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 3

Dual Mode High Quality Speech Encoder

This circuit arrangement is an efficient digital speech encoder which provides 32K bits/sec. of internationally acceptable quality speech, when the speech frequency spectrum is limited to less than 4 KHz. It also provides approximately 8 KHz of effective bandwidth, if the speech signal is derived from a high quality source. Therefore, the number of digital speech channels, as compared to acceptable standards, is doubled while providing effectively twice the audio speech band. In effect, the digital speech encoding efficacy is quadrupled.

Basically, speech consists of voiced and unvoiced phonemes. The voiced part of speech usually lies in a voice frequency speech spectrum with most of the energy concentrated below 4 KHz. Some unvoiced sounds like plausives ("p" for example) also have energy concentrated at low frequencies. Fricatives like the "s", on the other hand, have basic energy above 4 KHz, with an energy maximum around 7 KHz. Therefore, a low-pass, high-pass filter arrangement is used for a dual-mode decision process. Whenever the energy of a phoneme is below 4 KHz, a four bit differential pulse code modulation (DPCM) encoder having a sampling rate of 8 K samples per second is used. If the speech energy is above 4 KHz, a 16K sampling rate is used, with two bits per sample. Since the fricatives are noise-like signals, it is assumed that sufficient speech quality is provided to give the impression to the listener of high quality speech, with considerable improvement in the understandability of fricatives.

Sufficient delay is interposed so that accurate switching can be provided between the dual modes of operation. This status information can be transmitted reliably and efficiently, without impacting the efficiency of the digital speech encoding process.

Referring to the figure, analog voice frequency waves are applied at input terminals 10, which are connected to a rectifier and resistance-capacitance (RC) filter combination 12 for developing direct current proportional to the total energy in the incomin...