Browse Prior Art Database

Full/Adaptive Phoneme Speech Data Compression

IP.com Disclosure Number: IPCOM000118859D
Original Publication Date: 1997-Aug-01
Included in the Prior Art Database: 2005-Apr-01
Document File: 1 page(s) / 38K

Publishing Venue

IBM

Related People

Schreck, E: AUTHOR

Abstract

For audio speech application, a compression scheme is disclosed which makes use of the known type of data. Instead of transferring the original data (even if compressed), it is better only to transfer a phoneme code representation of the speech and synthesize the original form of the speech on the receiving side. The phoneme code can consist of true phonemes, but may also be augmented by diads, triads or other metaphoneme sounds. Each language has only a limited number of phoneme (like letters of the alphabet), and any word in the language can be composed out of that phoneme reservoir. To maintain the speaker's characteristic voice, at the beginning of a connection, a table can be sent which contains the original phonemes of the speaker. If this is not available, a generic set of phonemes can be used.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 76% of the total text.

Full/Adaptive Phoneme Speech Data Compression

      For audio speech application, a compression scheme is disclosed
which makes use of the known type of data.  Instead of transferring
the original data (even if compressed), it is better only to transfer
a phoneme code representation of the speech and synthesize the
original form of the speech on the receiving side.  The phoneme code
can consist  of true phonemes, but may also be augmented by diads,
triads or other metaphoneme sounds.  Each language has only a limited
number of phoneme  (like letters of the alphabet), and any word in
the language can be composed out of that phoneme reservoir.  To
maintain the speaker's characteristic voice, at the beginning of a
connection, a table can be  sent which contains the original phonemes
of the speaker.  If this is not  available, a generic set of phonemes
can be used.  This also allows the  changing of the character of the
speaker, e.g., a male speaker can use  a female phoneme set and,
thus, sound like a female.  If particular phonemes are not available
for certain words or sounds, e.g., foreign words, an escape code can
precede the word or sound to indicate that it is being transmitted in
nonphoneme manner which might be compressed  or uncompressed.

      This type of compression is suitable for one to one
conversation, meaning only one person can talk at the same time on
one end of the conversation.  It also can be used in electronic mail
systems that al...