Browse Prior Art Database

X.400 Use of Extended Character Sets (RFC1502)

IP.com Disclosure Number: IPCOM000002331D
Original Publication Date: 1993-Aug-01
Included in the Prior Art Database: 2000-Sep-12
Document File: 12 page(s) / 26K

Publishing Venue

Internet Society Requests For Comment (RFCs)

Related People

H. Alvestrand: AUTHOR

Abstract

Since 1988, X.400 has had the capacity for carrying a large number of different character sets in a message by using the body part "GeneralText" defined by ISO/IEC 10021-7.

This text was extracted from a ASCII Text document.
This is the abbreviated version, containing approximately 11% of the total text.

Network Working Group H. Alvestrand

Request for Comments: 1502 SINTEF DELAB

August 1993

X.400 Use of Extended Character Sets

Status of this Memo

This RFC specifies an IAB standards track protocol for the Internet

community, and requests discussion and suggestions for improvements.

Please refer to the current edition of the "IAB Official Protocol

Standards" for the standardization state and status of this protocol.

Distribution of this memo is unlimited.

1. Introduction

Since 1988, X.400 has had the capacity for carrying a large number of

different character sets in a message by using the body part

"GeneralText" defined by ISO/IEC 10021-7.

Since 1992, the Internet also has the means of passing around

messages containing multiple character sets, by using the mechanism

defined in RFC-MIME.

This RFC defines a suggested method of using "GeneralText" in order

to harmonize as much as possible the usage of this body part.

2. General principles

2.1. Goals

The target of this memo is to define a way of using existing

standards to achieve:

(1) in the short term, a standard for sending E-mail in the

European languages (Latin letters with European accents,

Greek and Cyrillic)

(2) in the medium term, extending this to cover the Hebrew and

Arabic character sets

(3) in the long term, opening up true international E-mail by

allowing the full character set specified in ISO-10646 to be

used.

The author believes that this document gives a specification that can

easily accomodate the use of any character set in the ISO registry,

and, by giving guidance rules for choosing character sets, will help

interworking.

2.2. Families of character sets

2.2.1. ISO 6937/T.61

ISO 6937 is a code technique used and recommended in T.51 and T.101

(Teletex and Videotex service) and in X.500, providing a repertoire

of 333 characters from the Latin script by use of non- spacing

diacritical marks. It corresponds closely to CCITT recommendation

T.61.

The problem with that technique is that the character stream comes in

two modes, i.e., some characters are coded with one byte and some

with two (composite characters). This makes information processing

systems such as an E-mail UA or GW more complex.

It is also not extensible to other languages like Korean or Chinese,

or even Greek, without invoking the character set switching

techniques of ISO 2022.

2.2.2. ISO 8859

ISO 8859 defines a set of character sets, each suitable for use in

some group of languages. Each character in ISO 8859 is coded in a

single byte.

There are currently 11 parts of ISO 8859, plus a "supplementary" set,

registered as ISO IR 154. Most languages using single-byte characters

can be written in one or another of the ISO 8859 sets. There are

sets covering ...