Browse Prior Art Database

Speech Compression by Phoneme Recognition

IP.com Disclosure Number: IPCOM000050545D
Original Publication Date: 1982-Nov-01
Included in the Prior Art Database: 2005-Feb-10
Document File: 2 page(s) / 70K

Publishing Venue

IBM

Related People

Choy, DM: AUTHOR [+2]

Abstract

This invention is directed to a method for lossy batch mode speech wave compression in which an encoding device is trained to establish best match criteria between each wave pattern in a reference speech wave and each phoneme. There is also generated a Markov model defining states and transitions with respect to the reference speech wave. Next, subsequently applied speech waves are parsed into segments based upon global utterance optimizations and comparisons correlated to information stored in the trained encoder. Lastly, the speech segments are converted into a phoneme sequence in corresponding energy/time factors which are then recorded. The selection of the phoneme as the audio measure of interest and the compression of the subsequent speech wave by way of recognition are the critical observations.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 69% of the total text.

Page 1 of 2

Speech Compression by Phoneme Recognition

This invention is directed to a method for lossy batch mode speech wave compression in which an encoding device is trained to establish best match criteria between each wave pattern in a reference speech wave and each phoneme. There is also generated a Markov model defining states and transitions with respect to the reference speech wave. Next, subsequently applied speech waves are parsed into segments based upon global utterance optimizations and comparisons correlated to information stored in the trained encoder. Lastly, the speech segments are converted into a phoneme sequence in corresponding energy/time factors which are then recorded. The selection of the phoneme as the audio measure of interest and the compression of the subsequent speech wave by way of recognition are the critical observations.

Referring now to Fig. 1, an asymmetrical speech wave is applied by way of the voice input through the filter banks to a feature matching arrangement. It should be noted that the acoustic spectral feature prototype of each symbol is given before the speech encoding process. Such prototype can be obtained by statistical clustering methods. A symbol string is obtained on the basis of the comparison.

The method requires phoneme recognition as the next step. This is an optimization procedure, the flow diagram of which is shown in Fig.
3. This procedure extends over the entire utterance based on a probablistic finite state model...