Dismiss
InnovationQ/InnovationQ Plus content will be updated on Sunday, June 25, 10am ET, with new patent and non-patent literature collections. Click here to learn more.
Browse Prior Art Database

UTF-7 A Mail-Safe Transformation Format of Unicode (RFC2152)

IP.com Disclosure Number: IPCOM000002709D
Original Publication Date: 1997-May-01
Included in the Prior Art Database: 2000-Sep-13
Document File: 11 page(s) / 26K

Publishing Venue

Internet Society Requests For Comment (RFCs)

Related People

D. Goldsmith: AUTHOR [+2]

Abstract

The Unicode Standard, version 2.0, and ISO/IEC 10646-1:1993(E) (as amended) jointly define a character set (hereafter referred to as Unicode) which encompasses most of the world's writing systems. However, Internet mail (STD 11, RFC 822) currently supports only 7- bit US ASCII as a character set. MIME (RFC 2045 through 2049) extends Internet mail to support different media types and character sets, and thus could support Unicode in mail messages. MIME neither defines Unicode as a permitted character set nor specifies how it would be encoded, although it does provide for the registration of additional character sets over time.

This text was extracted from a ASCII Text document.
This is the abbreviated version, containing approximately 12% of the total text.

Network Working Group D. Goldsmith

Request for Comments: 2152 Apple Computer, Inc.

Obsoletes: RFC 1642 M. Davis

Category: Informational Taligent, Inc.

May 1997

UTF-7

A Mail-Safe Transformation Format of Unicode

Status of this Memo

This memo provides information for the Internet community. This memo

does not specify an Internet standard of any kind. Distribution of

this memo is unlimited.

Abstract

The Unicode Standard, version 2.0, and ISO/IEC 10646-1:1993(E) (as

amended) jointly define a character set (hereafter referred to as

Unicode) which encompasses most of the world's writing systems.

However, Internet mail (STD 11, RFC 822) currently supports only 7-

bit US ASCII as a character set. MIME (RFC 2045 through 2049) extends

Internet mail to support different media types and character sets,

and thus could support Unicode in mail messages. MIME neither defines

Unicode as a permitted character set nor specifies how it would be

encoded, although it does provide for the registration of additional

character sets over time.

This document describes a transformation format of Unicode that

contains only 7-bit ASCII octets and is intended to be readable by

humans in the limiting case that the document consists of characters

from the US-ASCII repertoire. It also specifies how this

transformation format is used in the context of MIME and RFC 1641,

"Using Unicode with MIME".

Motivation

Although other transformation formats of Unicode exist and could

conceivably be used in this context (most notably UTF-8, also known

as UTF-2 or UTF-FSS), they suffer the disadvantage that they use

octets in the range decimal 128 through 255 to encode Unicode

characters outside the US-ASCII range. Thus, in the context of mail,

those octets must themselves be encoded. This requires putting text

through two successive encoding processes, and leads to a significant

expansion of characters outside the US-ASCII range, putting non-

English speakers at a disadvantage. For example, using UTF-8 together

with the Quoted-Printable content transfer encoding of MIME

represents US-ASCII characters in one octet, but other characters may

require up to nine octets.

Overview

UTF-7 encodes Unicode characters as US-ASCII octets, together with

shift sequences to encode characters outside that range. For this

purpose, one of the characters in the US-ASCII repertoire is reserved

for use as a shift character.

Many mail gateways and systems cannot handle the entire US-ASCII

character set (those based on EBCDIC, for example), and so UTF-7

contains provisions for encoding characters within US-ASCII in a way

that all mail systems can accomodate.

UTF-7 should normally be used only in the ...