A flexible control method for the Cantonese user-defined character(UDC) input
Original Publication Date: 2000-Mar-01
Included in the Prior Art Database: 2003-Jun-18
In speech recognition system with dictation function, such as ViaVoice, activate/deactivate a UDC Topic which is built specially to handle user-defined characters (UDC) using the statistical models, can switch two character sets, with UDC or without UDC. Such a switch function can also be realized as an automatic control through the detection of the UDC support software in the user's Operation System (OS). Nowadays many SR systems such as ViaVoice have tasks for specific domains, which are called Topics. A topic task consists of two parts, one is the vocabulary including terms used in this specific domain but same character set, another is a statistical Language Model(LM) generated from the corpus in this same domain. When a user wants to dictate articles in this domain, s/he can enable the topic, then higher dictation accuracy will be reached. In this invention, a UDC Topic is built in the SR system (Fig. 1). It's not a topic in general sense, but a control of character set change working with the UDC support software detector. The UDC topic consists of two parts, vocabulary and LM. Because the UDC are mainly from tongue and address, the vocabulary collects all such words used in dialogs and all the addresses containing UDC, so it is asserted that the UDC Topic covers most of the UDC, or even all. Compared with the general vocabulary used in dictation, the size of the UDC vocabulary is much smaller. Accordingly, for LM, because it's a statistical n-gram LM only for those words in the UDC vocabulary, the size will be much smaller too. All of the above results in a looser require for resource, disk space, installation CD-ROM space, etc.