Browse Prior Art Database

HZ - A Data Format for Exchanging Files of Arbitrarily Mixed Chinese and ASCII characters (RFC1843)

IP.com Disclosure Number: IPCOM000004099D
Original Publication Date: 1995-Aug-01
Included in the Prior Art Database: 2019-Feb-12
Document File: 5 page(s) / 7K

Publishing Venue

Internet Society Requests For Comment (RFCs)

Related People

F. Lee: AUTHOR

Related Documents

10.17487/RFC1843: DOI

Abstract

The content of this memo is identical to an article of the same title written by the author on September 4, 1989. This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 39% of the total text.

Network Working Group F. Lee Request for Comments: 1843 Stanford University Category: Informational August 1995

HZ - A Data Format for Exchanging Files of Arbitrarily Mixed Chinese and ASCII characters

Status of this Memo

This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited.

Abstract

The content of this memo is identical to an article of the same title written by the author on September 4, 1989. In this memo, GB stands for GB2312-80. Note that the title is kept only for historical reasons. HZ has been widely used for purposes other than "file exchange".

1. Introduction

Most existing computer systems which can handle a text file of arbitrarily mixed Chinese and ASCII characters use 8-bit codes. To exchange such text files through electronic mail on ASCII computer systems, it is necessary to encode them in a 7-bit format. A generic binary to ASCII encoder is not sufficient, because there is currently no universal standard for such 8-bit codes. For example, CCDOS and Macintosh’s Chinese OS use different internal codes. Fortunately, there is a PRC national standard, GuoBiao (GB), for the encoding of Chinese characters, and Chinese characters encoded in the above systems can be easily converted to GB by a simple formula. (* The ROC standard BIG-5 is outside the scope of this article.)

HZ is a 7-bit data format proposed for arbitrarily mixed GB and ASCII text file exchange. HZ is also intended for the design of terminal emulators that display and edit mixed Chinese and ASCII text files in real time.

Lee Informational [Page 1]

RFC 1843 HZ - A Data Format for Exchanging Files August 1995

2. Specification

The format of HZ is described in the following.

Without loss of generality, we assume that all Chinese characters (HanZi) have already been encoded in GB. A GB (GB1 and GB2) code is a two byte code, where the first byte is in the range $21-$77 (hexadecimal), and the second byte is in the range $21-$7E.

A graphical ASCII character is a byte in the range $21-$7E. A non- graphical ASCII character is a byte in the range $0-$20 or of the value $7F.

Since the range of a graphical ASCII character overlaps that of a GB byte, a byte in the range $21-$7E is interpreted according to the mode it is in. There are two modes, namely ASCII mode and GB mode.

By convention, a non-graphical ASCII character should only appear in ASCII mode.

The default mode is ASCII mode.

In ASCII mode, a byte is interpreted as an ASCII character, unless a ’˜’ is encountered. The character ’˜’ is an escape character. By convention, it must be immediately followed ONLY by ’˜’, ’{’ or ’\n’ (<LF>), with the following special meaning.

o The escape sequence ’˜˜’ is interpreted as a ’˜’. o The escape-to-GB sequence ’˜{’ switches the mode from ASCII to GB. o The escape sequence ’˜\n’ is a line-continuation marker to be consumed with no output produced.

In GB mode, characters are interprete...

Processing...
Loading...