Browse Prior Art Database

Four-Corner-Plus-Pinyin Method for Chinese Character Encoding

IP.com Disclosure Number: IPCOM000062072D
Original Publication Date: 1986-Oct-01
Included in the Prior Art Database: 2005-Mar-09
Document File: 3 page(s) / 30K

Publishing Venue

IBM

Related People

Li, TJ: AUTHOR [+2]

Abstract

A method is described to encode the Chinese characters in a simple, consistent, and easily memorizable fashion for computer input. Chinese characters are pictorial. Each Chinese character is composed from some basic parts, called radicals. There are 214 known radicals. These radicals are not equivalent to the alphabets of the western languages in that the assembly of parts into a character is not in a linear fashion, but rather in a spatial manner. Thus, by knowing the radicals of a Chinese character, and the order by which the radicals are assembled into the character, is not sufficient for one to construct the correct character. This is because the placement of radicals in a two-dimensional space is important.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 3

Four-Corner-Plus-Pinyin Method for Chinese Character Encoding

A method is described to encode the Chinese characters in a simple, consistent, and easily memorizable fashion for computer input. Chinese characters are pictorial. Each Chinese character is composed from some basic parts, called radicals. There are 214 known radicals. These radicals are not equivalent to the alphabets of the western languages in that the assembly of parts into a character is not in a linear fashion, but rather in a spatial manner. Thus, by knowing the radicals of a Chinese character, and the order by which the radicals are assembled into the character, is not sufficient for one to construct the correct character. This is because the placement of radicals in a two- dimensional space is important. For example, in the English language, if I say the word consists of a "G" followed by an "O," I have uniquely identified the word to be "GO." However, that may not be the case for Chinese characters because the "O" may be placed to the right or below the "G." The two characters so formed may not both be valid. If they are both valid, they may have different meanings. Many Chinese dictionaries use the Four-Corner numbering system to arrange the characters. Each character is assigned a four-digit decimal number, as shown in Fig. 1. This four-digit number is derived from the shapes at the upper-left, upper- right, lower-left, and lower-right corners of a character. Thus, the four-digit number representation for a given character can be identified easily from its shape. However, it is possible that two or more characters may have the same Four-Corner number. We shall call the number of characters having the same Four-Corner number as the multiplicity of that Four-Corner number. For example, the multiplicity of 0010 is 7 because there are 7 characters with the same Four- Corner number of 0010. There are more four-digit numbers than there are characters: 10,000 to 6641. However, because of randomness, there are only 2309 four-digit numbers that have one or more associated valid characters. Thus, on the average, there are 2.9 characters per valid four-digit number. Only 1050 out of 2309 valid four-digit numbers uniquely identify their associated characters. For the remaining 1259 four-digit numbers, there are two or more characters associated with each four- digit number. One of the four-digit numbers has 32 valid characters associat...