Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Automatic Katakana Word Detection With Kana N-Gram

IP.com Disclosure Number: IPCOM000036225D
Original Publication Date: 1989-Sep-01
Included in the Prior Art Database: 2005-Jan-28
Document File: 3 page(s) / 47K

Publishing Venue

IBM

Related People

Sumita, E: AUTHOR [+2]

Abstract

This article presents a method to input a katakana word that is possibly troublesome with conventional kana-kanji conversion.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 3

Automatic Katakana Word Detection With Kana N-Gram

This article presents a method to input a katakana word that is possibly troublesome with conventional kana-kanji conversion.

A Japanese word is represented with character sequence of 3 types, that is katakana, hirakana, and kanji. Both katakana and hirakana are called kana. They are used to convey a pronunciation of Japanese. Kana- kanji conversion is a method which inputs the pronunciation of a word using kana, i.e. katakana or hirakana, and outputs appropriate representation of the word. The method described in this article overcomes the disadvantage of conventional kana-kanji conversion. The conventional kana-kanji conversion often fails to katakana word, because there are too many katakana words to register them in the memory for kana-kanji conversion. There are two types of words in Japanese. One is katakana word which is represented only with katakana. The other is a hirakana word which is represented in a mixture of hirakana and kanji. By using this method a user can input a katakana word without installing a large dictionary of katakana words or typing a special key for entering the katakana input mode. On the contrary, this method uses a small frequency table of kana n-gram (kana sequence whose length is n), and calculates the possibility of the word being a katakana word without any trigger from the user. This method makes the labor of inputting a katakana word less and then improves the total speed of Japanese input. This method requires a only small amounts of memory for frequency table of kana n-gram.

Many of katakana words are borrowed from European words, mainly English words. Their pronunciation is changed into Japanese. On the contrary, hirakana words are originated from Chinese, otherwise Japanese from the outset. The difference in the pronunciation between katakana words and hirakana words is...