Browse Prior Art Database

A Multi-Lingual Speaker-Independent Voice Name Dialing System

IP.com Disclosure Number: IPCOM000031779D
Published in the IP.com Journal: Volume 4 Issue 11 (2004-11-25)
Included in the Prior Art Database: 2004-Nov-25
Document File: 1 page(s) / 24K

Publishing Venue

Siemens

Related People

Juergen Carstens: CONTACT

Abstract

Nowadays, it is possible with new technologies like grapheme-to-phoneme conversation (G2P) to access the phone book or the address book in a mobile phone by voice without any training (say-in). The user just presses the push-to-speak (PTS) button, speaks a name in his contacts, and after a speech recognition has been run a phone call to the respective person is established. Such functionality is called "speaker-independent voice name dialing (SI-VND)". This means: No say-in (i.e. no training of the names to be recognized) is necessary. The voice access to the names is available out of the box. The underlying technology of SI-VND is G2P. G2P often is implemented as a set of rules or as a neural network trained to convert (parts of) words into a phonemic description. This conversion is also called "transcription". The resulting phonemic description gives a speech recognizer the information necessary to recognize the name. Due to the generation of a phonemic representation (through G2P) of names, the SI-VND is highly language-dependent (LD). In state of the art implementations, the speech recognition (and/or the dialogue between user and device in general) is designed to take place in a particular language (such as German). This means that the name "Peter" is automatically transcribed into the SAMPA (Speech Assessment Methods Phonetic Alphabet) phoneme string /p e: t 6/.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 1

S

A Multi-Lingual Speaker-Independent Voice Name Dialing System

Idea: Dr. Tim Fingscheidt, DE-Muenchen; Dr. Beate Specker, DE-Muenchen; Dr. Sorel Silvesrtu

Stan, DE-Muenchen

Nowadays, it is possible with new technologies like grapheme-to-phoneme conversation (G2P) to access the phone book or the address book in a mobile phone by voice without any training (say-in). The user just presses the push-to-speak (PTS) button, speaks a name in his contacts, and after a speech recognition has been run a phone call to the respective person is established. Such functionality is called "speaker-independent voice name dialing (SI-VND)". This means: No say-in (i.e. no training of the names to be recognized) is necessary. The voice access to the names is available out of the box. The underlying technology of SI-VND is G2P. G2P often is implemented as a set of rules or as a neural network trained to convert (parts of) words into a phonemic description. This conversion is also called "transcription". The resulting phonemic description gives a speech recognizer the information necessary to recognize the name. Due to the generation of a phonemic representation (through G2P) of names, the SI-VND is highly language-dependent (LD). In state of the art implementations, the speech recognition (and/or the dialogue between user and device in general) is designed to take place in a particular language (such as German). This means that the name "Peter" is automatically transcribed into the SAMPA (Speech Assessment Methods Phonetic Alphabet) phoneme string /p e: t 6/.

The problem to be solved is that the one name is expressed differently in different languages. And another problem is how to speak foreign names and how can these names be recognized properly. In the example above, the German user may have a contact "Peter" who is British, and the user pronounces it correctly in British English (according to /p i: t @/). It is likely that the speech recognition process is highly influenced by such pronunciation t...