Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

UTF-8, a transformation format of ISO 10646 (RFC3629)

IP.com Disclosure Number: IPCOM000020258D
Original Publication Date: 2003-Nov-01
Included in the Prior Art Database: 2003-Dec-04
Document File: 15 page(s) / 34K

Publishing Venue

Internet Society Requests For Comment (RFCs)

Related People

F. Yergeau: AUTHOR

Abstract

ISO/IEC 10646-1 defines a large character set called the Universal Character Set (UCS) which encompasses most of the world's writing systems. The originally proposed encodings of the UCS, however, were not compatible with many current applications and protocols, and this has led to the development of UTF-8, the object of this memo. UTF-8 has the characteristic of preserving the full US-ASCII range, providing compatibility with file systems, parsers and other software that rely on US-ASCII values but are transparent to other values. This memo obsoletes and replaces RFC 2279.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 10% of the total text.

Network Working Group F. Yergeau

Request for Comments: 3629 Alis Technologies

STD: 63 November 2003

Obsoletes: 2279

Category: Standards Track

UTF-8, a transformation format of ISO 10646

Status of this Memo

This document specifies an Internet standards track protocol for the

Internet community, and requests discussion and suggestions for

improvements. Please refer to the current edition of the "Internet

Official Protocol Standards" (STD 1) for the standardization state

and status of this protocol. Distribution of this memo is unlimited.

Copyright Notice

Copyright (C) The Internet Society (2003). All Rights Reserved.

Abstract

ISO/IEC 10646-1 defines a large character set called the Universal

Character Set (UCS) which encompasses most of the world's writing

systems. The originally proposed encodings of the UCS, however, were

not compatible with many current applications and protocols, and this

has led to the development of UTF-8, the object of this memo. UTF-8

has the characteristic of preserving the full US-ASCII range,

providing compatibility with file systems, parsers and other software

that rely on US-ASCII values but are transparent to other values.

This memo obsoletes and replaces RFC 2279.

Table of Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2

2. Notational conventions . . . . . . . . . . . . . . . . . . . . 3

3. UTF-8 definition . . . . . . . . . . . . . . . . . . . . . . . 4

4. Syntax of UTF-8 Byte Sequences . . . . . . . . . . . . . . . . 5

5. Versions of the standards . . . . . . . . . . . . . . . . . . 6

6. Byte order mark (BOM) . . . . . . . . . . . . . . . . . . . . 6

7. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

8. MIME registration . . . . . . . . . . . . . . . . . . . . . . 9

9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10

10. Security Considerations . . . . . . . . . . . . . . . . . . . 10

11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11

12. Changes from RFC 2279 . . . . . . . . . . . . . . . . . . . . 11

13. Normative References . . . . . . . . . . . . . . . . . . . . . 12

Yergeau Standards Track [Page 1]

RFC 3629 UTF-8 November 2003

14. Informative References . . . . . . . . . . . . . . . . . . . . 12

15. URI's . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

16. Intellectual Property Statement . . . . . . . . . . . . . . . 13

17. Author's Address . . . . . . . . . . . . . . . . . . . . . . . 13

18. Full Copyright Statement . . . . . . . . . . . . . . . . . . . 14

1. Introduction

ISO/IEC 10646 [ISO.10646] defines a large character set called the

Universal Character Set (UCS), which encompasses most of the world's

writing systems. The same set of characters is defined by the

Unicode standard [UNICODE], which further defines additional

character properties and other application details of great interest

to implementers. Up to the present time, changes in Unicode and

amendments and additions to ISO/IEC 10646 have tracked each...