Speaker-Independent Phonebook Name Dialing
Original Publication Date: 2001-Nov-26
Included in the Prior Art Database: 2001-Nov-26
Yaxin Zhang: AUTHOR [+2]
AbstractSpeaker-Independent Phonebook Name Dialing
Voice recognition has been used in more and more hand held devices such as cellular phones. A popular application is the speaker-dependent nametag dialling. In this application a user is asked to type in a nametag and its associated phone number, then speak the nametag a few times. The system will train an acoustic model for the nametag and store it in memory. This is normally called a training procedure. After the user completes training for all nametags, he/she can dial any phone number in the memory by pressing a quick dial button and then saying the nametag.
Nametag dialling brings convenience to the phone users, who do not need to remember a lot of phone numbers or their positions in the phone memory, nor do they need to press a number of buttons in hands busy environments to make a phone call. However, a market survey shows that a very low percentage users use nametag dialling. One major reason is that general users are reluctant to go through the training procedure following the instructions in the user’s manual. Based on this observation, we propose a speaker-independent name dialling function for cellular phones. In this function a user can dial any phone number in the phonebook (memory) by saying the associated name. Users do not need to train personal models since the global acoustic models are trained off-line and loaded in the memory at the manufacture stage.
Name dialling is based on a speaker-independent Chinese syllable recogniser and the special structure of Chinese names. Chinese name consists of a number of characters. More than 99.5% of names have two or three characters. Each character is a single syllable. Since only a negligible percentage of Chinese characters have multi pronunciations, comparing with the names in western languages it is relatively an easy task to convert user uttered syllables to the targeted name. When a user says a name in the phonebook, the speech recogniser will turn the sound into a number of base syllables, from which the name will be located and its number will be shown on the screen of the phone. The user will be asked if he/she like to dial the number.
2. Description of the system
Referring to Fig. 1, there is illustrated a block diagram of a phonebook name dialling system. The system is constructed from a connected Chinese syllable recogniser. A ROM 5 contains the whole set of acoustic models for Mandarin Chinese. The set is normally represented by 408 base syllables or 1254 tonal syllables. The basic acoustic unit could be tri-phone, di-phone, or syllable.
A phonebook 6 stores a list of names and their associated phone numbers the user typed in. Since the number of base syllables corresponding to the names in the phonebook is much less than that in the global model set, we can improve the recognition accuracy by loading part of the global acoustic models into a RAM 4. The current acoustic model...