Browse Prior Art Database

A way to identify character variants defined in each language column in ISO/IEC 10646 and Unicode CJK unified ideographs

IP.com Disclosure Number: IPCOM000013869D
Original Publication Date: 2001-May-26
Included in the Prior Art Database: 2003-Jun-18

Publishing Venue

IBM

Abstract

Disclosed is an architecture to modify the UCS-4 (Universal Multiple-Octet Coded Character Set ISO/IEC 10646) encoding scheme to differentiate the languages: Japanese, Korean, Simplified Chinese and Traditional Chinese. A character is represented by 4 octets (32 bits=4 bytes) in the original UCS-4 architecture defined in ISO/IEC 10646. A character is still represented by 4 octets in this main idea; however, the first octet (byte) is used as a character variant identifier. By utilizing the first octet as a character variant identifier, every character encoded in 4 octets can be examined and identified whether it is for Japanese, Korean, Simplified Chinese or Traditional Chinese. For example, a code point X'9AA8' defines the character which means "bone" in the original UCS-4 (ISO/IEC 10646) architecture. Based on the character glyph unification rules, the following four character glyph shapes were unified at the code point X'00 00 9AA8'.