Browse Prior Art Database

Supporting Japanese Dakuten Characters During Dataset Translation

IP.com Disclosure Number: IPCOM000043564D
Original Publication Date: 1984-Sep-01
Included in the Prior Art Database: 2005-Feb-05
Document File: 4 page(s) / 41K

Publishing Venue

IBM

Related People

Aiken, JA: AUTHOR [+2]

Abstract

The disclosed method for converting some Japanese Business/Personal Computer (JPC) data sets into and from documents that can be processed by a Japanese Word Processor (JWP) on a small system takes into account the differing data stream representations to provide a conversion that is as faithful as possible. In particular, Japanese Dakuon characters, formed from a base character and a diacritic (Dakuten) mark, are translated so that no information is lost and so that the output resembles the input as far as possible, even though the data stream of JPC does not support the same number of characters as the JWP data stream. Where the JPC data stream does not allow the full Dakuon character, the JWP Dakuon character is decomposed into the base character and the diacritic in the resulting JPC data set.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 34% of the total text.

Page 1 of 4

Supporting Japanese Dakuten Characters During Dataset Translation

The disclosed method for converting some Japanese Business/Personal Computer (JPC) data sets into and from documents that can be processed by a Japanese Word Processor (JWP) on a small system takes into account the differing data stream representations to provide a conversion that is as faithful as possible. In particular, Japanese Dakuon characters, formed from a base character and a diacritic (Dakuten) mark, are translated so that no information is lost and so that the output resembles the input as far as possible, even though the data stream of JPC does not support the same number of characters as the JWP data stream. Where the JPC data stream does not allow the full Dakuon character, the JWP Dakuon character is decomposed into the base character and the diacritic in the resulting JPC data set. Where a diacritic character follows a base character of the Dakuon set, the resulting JWP document also contains two characters. Normal input from the keyboard of the base character and diacritic under JWP will result in a combined Dakuten character, reflecting the enhanced support of such character constructs under JWP. A Japanese small business computer/personal computer may support several applications tailored to the particular user set. Business computer applications require a general-purpose system which is very flexible and which may be used in a large variety of situations. Users are often familiar with computer systems and programming concepts. JPC data sets, in particular, are designed to be able to support a very large variety of business or personal applications. This means that the data set and data stream architectures should be very simple and extendable to many applications. Word processing applications present special performance and function requirements. To provide optimum function for a word processing user, the system and application programming may be tailored specifically to the unique JWP requirements. Data set and data stream architectures, in particular, may be very complex and designed specifically for the JWP needs. Word processing operators as a class tend to be less sophisticated in their knowledge of computer concepts than general JPC users. This is not necessarily a disadvantage, since the JWP applications generally perform all functions associated with data streams and data set architectures. The JWP operator thus does not know at all about the complexity of the underlying structures of a document. The Japanese written language consists of many thousands of ideographic characters (Kanji). Due to the complexity of the characters, so that readers may recognize them easily, each one needs more space on a page than would normally be true of Latin-based characters. For consistency, all characters, even the Latin-based alphanumeric and the Japanese phonetic characters, are represented in a character set where all characters occupy the same amount...