Browse Prior Art Database

UTF-7 A Mail-Safe Transformation Format of Unicode (RFC2152)

IP.com Disclosure Number: IPCOM000002709D
Original Publication Date: 1997-May-01
Included in the Prior Art Database: 2019-Feb-15
Document File: 15 page(s) / 19K

Publishing Venue

Internet Society Requests For Comment (RFCs)

Related People

D. Goldsmith: AUTHOR [+1]

Related Documents

10.17487/RFC2152: DOI

Abstract

This document describes a transformation format of Unicode that contains only 7-bit ASCII octets and is intended to be readable by humans in the limiting case that the document consists of characters from the US-ASCII repertoire. It also specifies how this transformation format is used in the context of MIME and RFC 1641, "Using Unicode with MIME". This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 13% of the total text.

Network Working Group D. Goldsmith Request for Comments: 2152 Apple Computer, Inc. Obsoletes: RFC 1642 M. Davis Category: Informational Taligent, Inc. May 1997

UTF-7

A Mail-Safe Transformation Format of Unicode

Status of this Memo

This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited.

Abstract

The Unicode Standard, version 2.0, and ISO/IEC 10646-1:1993(E) (as amended) jointly define a character set (hereafter referred to as Unicode) which encompasses most of the world’s writing systems. However, Internet mail (STD 11, RFC 822) currently supports only 7- bit US ASCII as a character set. MIME (RFC 2045 through 2049) extends Internet mail to support different media types and character sets, and thus could support Unicode in mail messages. MIME neither defines Unicode as a permitted character set nor specifies how it would be encoded, although it does provide for the registration of additional character sets over time.

This document describes a transformation format of Unicode that contains only 7-bit ASCII octets and is intended to be readable by humans in the limiting case that the document consists of characters from the US-ASCII repertoire. It also specifies how this transformation format is used in the context of MIME and RFC 1641, "Using Unicode with MIME".

Motivation

Although other transformation formats of Unicode exist and could conceivably be used in this context (most notably UTF-8, also known as UTF-2 or UTF-FSS), they suffer the disadvantage that they use octets in the range decimal 128 through 255 to encode Unicode characters outside the US-ASCII range. Thus, in the context of mail, those octets must themselves be encoded. This requires putting text through two successive encoding processes, and leads to a significant expansion of characters outside the US-ASCII range, putting non- English speakers at a disadvantage. For example, using UTF-8 together

Goldsmith & Davis Informational [Page 1]

RFC 2152 UTF-7 May 1997

with the Quoted-Printable content transfer encoding of MIME represents US-ASCII characters in one octet, but other characters may require up to nine octets.

Overview

UTF-7 encodes Unicode characters as US-ASCII octets, together with shift sequences to encode characters outside that range. For this purpose, one of the characters in the US-ASCII repertoire is reserved for use as a shift character.

Many mail gateways and systems cannot handle the entire US-ASCII character set (those based on EBCDIC, for example), and so UTF-7 contains provisions for encoding characters within US-ASCII in a way that all mail systems can accomodate.

UTF-7 should normally be used only in the context of 7 bit transports, such as mail. In other contexts, straight Unicode or UTF-8 is preferred.

See RFC 1641, "Using Unicode with MIME" for the overall specification on usage of Unicode transformation formats with MIME.

Definitions

First, the defini...

Processing...
Loading...