Browse Prior Art Database

Method and apparatus for Generating Personalized Vocabulary for Chinese Input Method

IP.com Disclosure Number: IPCOM000168012D
Original Publication Date: 2008-Feb-28
Included in the Prior Art Database: 2008-Feb-28
Document File: 2 page(s) / 102K

Publishing Venue

IBM

Abstract

The key factor of success for a Chinese Input Method is the quality of the vocabulary. By leveraging the collective intelligence, we invent a new approach to build the system vocabulary and recommend appropriate personalized vocabulary to each user for Chinese input method. The users can share their own personal vocabularies. The collected personal vocabularies can be used to generate up-to-date system vocabularies dynamically, and can be grouped together, and be further used to generate personalized vocabulary in a very fine granularity by using Collaborative filtering approaches. The users can also manually join one group, all the member in this group are sharing with the group vocabulary

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 38% of the total text.

Page 1 of 2

Method and apparatus for Generating Personalized Vocabulary for Chinese Input Method and apparatus for Generating Personalized Vocabulary for Chinese InputMethod and apparatus for Generating Personalized Vocabulary for Chinese Input Method
MethodMethod

1. Background BackgroundBackground :

::: What is the problem solved by your invention
invention???? Describe known solutions to this

problem problemproblem (

(((if any
if anyif any
if any).).).). What are the drawbacks of such known solutions

What are the drawbacks of such known solutionsWhat are the drawbacks of such known solutions
What are the drawbacks of such known solutions,,,, or why is an additional solution solution

required requiredrequired?

??? Cite any relevant technical documents or references Cite any relevant technical documents or referencesCite any relevant technical documents or references .

..

For Chinese, or other East Asian languages like Japanese, Korean, due to the thousands of characters, it is impossible to use a one-to-one character to keyboard key model for input. Therefore, to allow for users to input characters of East Asian languages, several Input Methods have been devised. Especially for Chinese input, there are hundreds of Chinese Input Methods available in the market. The Keyboard based input methods can be classified in three main types: a) by encoding b) by pronunciation c) by structure of the characters. No matter which type of the input method, the key factor of the success of the input method is the quality of the vocabulary. In general, the providers ship the input method with the default system vocabulary which is generated by analyzing large amount of corpus. At the same time, the input method will record the input words of each user locally as the personal vocabulary. The system will select the candidates from both system and personal vocabularies when the user makes the input.

In such systems, the system vocabulary, which contains the most popular words for all users, mainly depends on the selected corpus, so the quality of the system vocabulary depends on the quality of the selected corpus. However, it is very hard to select the corpus which can completely reflect the user's input patterns in a timely fashion. In another way, the personal vocabulary is a supplemental to the system vocabulary. The personal vocabulary records one user's input words which may not be found in the system vocabulary. However, the combination of system vocabulary and personal vocabulary is still not enough because the personal vocabulary needs a long time for manual accumulation and it is always not enough. Some input methods provide extended vocabularies for some special domains to partly solve this problem. However, these approaches need manually construction for the specific vocabularies, and the end users must be aware to these domain specific...