Browse Prior Art Database

UTF-7 - A Mail-Safe Transformation Format of Unicode (RFC1642)

IP.com Disclosure Number: IPCOM000002478D
Original Publication Date: 1994-Jul-01
Included in the Prior Art Database: 2001-Nov-12
Document File: 15 page(s) / 28K

Publishing Venue

Internet Society Requests For Comment (RFCs)

Related People

D. Goldsmith: AUTHOR [+2]

Abstract

The Unicode Standard, version 1.1, and ISO/IEC 10646-1:1993(E) jointly define a 16 bit character set (hereafter referred to as Unicode) which encompasses most of the world's writing systems. However, Internet mail (STD 11, RFC 822) currently supports only 7- bit US ASCII as a character set. MIME (RFC 1521 and RFC 1522) extends Internet mail to support different media types and character sets, and thus could support Unicode in mail messages. MIME neither defines Unicode as a permitted character set nor specifies how it would be encoded, although it does provide for the registration of additional character sets over time. (Download file contains alternative document formats.)

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 13% of the total text.

Network Working Group                                       D. Goldsmith

Request for Comments: 1642                                      M. Davis

Category: Experimental                                    Taligent, Inc.

                                                               July 1994

                                 UTF-7

              A Mail-Safe Transformation Format of Unicode

Status of this Memo

   This memo defines an Experimental Protocol for the Internet

   community.  This memo does not specify an Internet standard of any

   kind.  Distribution of this memo is unlimited.

Abstract

   The Unicode Standard, version 1.1, and ISO/IEC 10646-1:1993(E)

   jointly define a 16 bit character set (hereafter referred to as

   Unicode) which encompasses most of the world's writing systems.

   However, Internet mail (STD 11, RFC 822) currently supports only 7-

   bit US ASCII as a character set. MIME (RFC 1521 and RFC 1522) extends

   Internet mail to support different media types and character sets,

   and thus could support Unicode in mail messages. MIME neither defines

   Unicode as a permitted character set nor specifies how it would be

   encoded, although it does provide for the registration of additional

   character sets over time.

   This document describes a new transformation format of Unicode that

   contains only 7-bit ASCII characters and is intended to be readable

   by humans in the limiting case that the document consists of

   characters from the US-ASCII repertoire. It also specifies how this

   transformation format is used in the context of RFC 1521, RFC 1522,

   and the document "Using Unicode with MIME".

Motivation

   Although other transformation formats of Unicode exist and could

   conceivably be used in this context (most notably UTF-1 and UTF-8,

   also known as UTF-2 or UTF-FSS), they suffer the disadvantage that

   they use octets in the range decimal 128 through 255 to encode

   Unicode characters outside the US-ASCII range. Thus, in the context

   of mail, those octets must themselves be encoded. This requires

   putting text through two successive encoding processes, and leads to

   a significant expansion of characters outside the US-ASCII range,

   putting non-English speakers at a disadvantage. For example, using

Goldsmith & Davis                                               [Page 1]

RFC 1642                         UTF-7                         July 1994

   UTF-FSS together with the Quoted-Printable content transfer encoding

   of MIME represents US-ASCII characters in one octet, but other

   characters may require up to nine octets.

Overview

   UTF-7 encodes Unicode characters as US-ASCII, together with shift

   sequences to encode characters outside that range. For this purpose,

   one of the characters in the US-ASCII repertoire is reserved for use

   as a shift character.

   Many mail gateways and systems cannot handle the entire US-ASCII

   character set (those based on EBCDIC, for example), and so UTF-7

   contains provisions for encoding characters within US-ASCII in a way

   that all mail systems can accomodate.

   UTF-7 should normally be used only in the context of 7 bit

   transports, such as mail and news. In other contexts, straight

   Unicode or UTF-8 is preferred.

   See the document "Using Unicode with MIME" for the overall

   specification on usage of Unicode transformation formats with MIME.

Definitions

   First, the definition o...