Browse Prior Art Database

HZ - A Data Format for Exchanging Files of Arbitrarily Mixed Chinese and ASCII characters (RFC1843)

IP.com Disclosure Number: IPCOM000004099D
Original Publication Date: 1995-Aug-01
Included in the Prior Art Database: 2000-Sep-13
Document File: 4 page(s) / 8K

Publishing Venue

Internet Society Requests For Comment (RFCs)

Related People

F. Lee: AUTHOR

Abstract

The content of this memo is identical to an article of the same title written by the author on September 4, 1989. In this memo, GB stands for GB2312-80. Note that the title is kept only for historical reasons. HZ has been widely used for purposes other than "file exchange".

This text was extracted from a ASCII Text document.
This is the abbreviated version, containing approximately 37% of the total text.

Network Working Group F. Lee

Request for Comments: 1843 Stanford University

Category: Informational August 1995

HZ - A Data Format for Exchanging Files of

Arbitrarily Mixed Chinese and ASCII characters

Status of this Memo

This memo provides information for the Internet community. This memo

does not specify an Internet standard of any kind. Distribution of

this memo is unlimited.

Abstract

The content of this memo is identical to an article of the same title

written by the author on September 4, 1989. In this memo, GB stands

for GB2312-80. Note that the title is kept only for historical

reasons. HZ has been widely used for purposes other than "file

exchange".

1. Introduction

Most existing computer systems which can handle a text file of

arbitrarily mixed Chinese and ASCII characters use 8-bit codes. To

exchange such text files through electronic mail on ASCII computer

systems, it is necessary to encode them in a 7-bit format. A generic

binary to ASCII encoder is not sufficient, because there is currently

no universal standard for such 8-bit codes. For example, CCDOS and

Macintosh's Chinese OS use different internal codes. Fortunately,

there is a PRC national standard, GuoBiao (GB), for the encoding of

Chinese characters, and Chinese characters encoded in the above

systems can be easily converted to GB by a simple formula. (* The ROC

standard BIG-5 is outside the scope of this article.)

HZ is a 7-bit data format proposed for arbitrarily mixed GB and ASCII

text file exchange. HZ is also intended for the design of terminal

emulators that display and edit mixed Chinese and ASCII text files in

real time.

2. Specification

The format of HZ is described in the following.

Without loss of generality, we assume that all Chinese characters

(HanZi) have already been encoded in GB. A GB (GB1 and GB2) code is

a two byte code, where the first byte is in the range $21-$77

(hexadecimal), and the second byte is in the range $21-$7E.

A graphical ASCII character is a byte in the range $21-$7E. A non-

graphical ASCII character is a byte in the range $0-$20 or of the

value $7F.

Since the range of a graphical ASCII character overlaps that of a GB

byte, a byte in the range $21-$7E is interpreted according to the

mode it is in. There are two modes, namely ASCII mode and GB mode.

By convention, a non-graphical ASCII character should only appear in

ASCII mode.

The default mode is ASCII mode.

In ASCII mode, a byte is interpreted as an ASCII character, unless a

'~' is encountered. The character '~' is an escape character. By

convention, it must be immediately followed ONLY by '~', '{' or '\n'

(), with the following special meaning.

o The escape sequence '~~' is interpreted as a '~'.

o The escape-to-GB sequence '~{' switches the mode from ASCII to

GB.