Browse Prior Art Database

A flexible control method for the Cantonese user-defined character(UDC) input

IP.com Disclosure Number: IPCOM000013429D
Original Publication Date: 2000-Mar-01
Included in the Prior Art Database: 2003-Jun-18
Document File: 2 page(s) / 68K

Publishing Venue

IBM

Abstract

In speech recognition system with dictation function, such as ViaVoice, activate/deactivate a UDC Topic which is built specially to handle user-defined characters (UDC) using the statistical models, can switch two character sets, with UDC or without UDC. Such a switch function can also be realized as an automatic control through the detection of the UDC support software in the user's Operation System (OS). Nowadays many SR systems such as ViaVoice have tasks for specific domains, which are called Topics. A topic task consists of two parts, one is the vocabulary including terms used in this specific domain but same character set, another is a statistical Language Model(LM) generated from the corpus in this same domain. When a user wants to dictate articles in this domain, s/he can enable the topic, then higher dictation accuracy will be reached. In this invention, a UDC Topic is built in the SR system (Fig. 1). It's not a topic in general sense, but a control of character set change working with the UDC support software detector. The UDC topic consists of two parts, vocabulary and LM. Because the UDC are mainly from tongue and address, the vocabulary collects all such words used in dialogs and all the addresses containing UDC, so it is asserted that the UDC Topic covers most of the UDC, or even all. Compared with the general vocabulary used in dictation, the size of the UDC vocabulary is much smaller. Accordingly, for LM, because it's a statistical n-gram LM only for those words in the UDC vocabulary, the size will be much smaller too. All of the above results in a looser require for resource, disk space, installation CD-ROM space, etc.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 2

A flexible control method for the Cantonese user-defined character(UDC) input

   In speech recognition system with dictation function, such as ViaVoice, activate/deactivate a UDC Topic which is built specially to handle user-defined characters (UDC) using the statistical models, can switch two character sets, with UDC or without UDC. Such a switch function can also be realized as an automatic control through the detection of the UDC support software in the user's Operation System (OS).

   Nowadays many SR systems such as ViaVoice have tasks for specific domains, which are called Topics. A topic task consists of two parts, one is the vocabulary including terms used in this specific domain but same character set, another is a statistical Language Model(LM) generated from the corpus in this same domain. When a user wants to dictate articles in this domain, s/he can enable the topic, then higher dictation accuracy will be reached.

   In this invention, a UDC Topic is built in the SR system (Fig. 1). It's not a topic in general sense, but a control of character set change working with the UDC support software detector.

   The UDC topic consists of two parts, vocabulary and LM. Because the UDC are mainly from tongue and address, the vocabulary collects all such words used in dialogs and all the addresses containing UDC, so it is asserted that the UDC Topic covers most of the UDC, or even all. Compared with the general vocabulary used in dictation, the size of the UDC vocabulary is much smaller. Accordingly, for LM, because it's a statistical n-gram LM only for those words in the UDC vocabulary, the size will be much smaller too. All of the above results in a looser require for resource, disk space, installation CD-ROM space, etc.

General Vocabulary

Cantonese SR System

 UDC support software detector

Acoustic Model

Dialog Vocabulary (containing UDC)

StatisticalLanguage Model

UDC Topic

Switch

Figure 1.

Statistical Dialog LM

   It's a flexible method to use UDC Top...