Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Language Tagging in Unicode Plain Text (RFC2482)

IP.com Disclosure Number: IPCOM000003062D
Original Publication Date: 1999-Jan-01
Included in the Prior Art Database: 2000-Sep-13
Document File: 11 page(s) / 26K

Publishing Venue

Internet Society Requests For Comment (RFCs)

Related People

K. Whistler: AUTHOR [+2]

Abstract

This document proposed a mechanism for language tagging in [UNICODE] plain text. A set of special-use tag characters on Plane 14 of [ISO10646] (accessible through UTF-8, UTF-16, and UCS-4 encoding forms) are proposed for encoding to enable the spelling out of ASCII-based string tags using characters which can be strictly separated from ordinary text content characters in ISO10646 (or UNICODE).

This text was extracted from a ASCII document.
This is the abbreviated version, containing approximately 11% of the total text.

Network Working Group K. Whistler

Request for Comments: 2482 Sybase

Category: Informational G. Adams

Spyglass

January 1999

Language Tagging in Unicode Plain Text

Status of this Memo

This memo provides information for the Internet community. It does

not specify an Internet standard of any kind. Distribution of this

memo is unlimited.

Copyright Notice

Copyright (C) The Internet Society (1999). All Rights Reserved.

IESG Note:

This document has been accepted by ISO/IEC JTC1/SC2/WG2 in meeting

#34 to be submitted as a recommendation from WG2 for inclusion in

Plane 14 in part 2 of ISO/IEC 10646.

1. Abstract

This document proposed a mechanism for language tagging in [UNICODE]

plain text. A set of special-use tag characters on Plane 14 of

[ISO10646] (accessible through UTF-8, UTF-16, and UCS-4 encoding

forms) are proposed for encoding to enable the spelling out of

ASCII-based string tags using characters which can be strictly

separated from ordinary text content characters in ISO10646 (or

UNICODE).

One tag identification character and one cancel tag character are

also proposed. In particular, a language tag identification character

is proposed to identify a language tag string specifically; the

language tag itself makes use of [RFC1766] language tag strings

spelled out using the Plane 14 tag characters. Provision of a

specific, low-overhead mechanism for embedding language tags in plain

text is aimed at meeting the need of Internet Protocols such as ACAP,

which require a standard mechanism for marking language in UTF-8

strings.

The tagging mechanism as well the characters proposed in this

document have been approved by the Unicode Consortium for inclusion

in The Unicode Standard. However, implementation of this decision

awaits formal acceptance by ISO JTC1/SC2/WG2, the working group

responsible for ISO10646. Potential implementers should be aware that

until this formal acceptance occurs, any usage of the characters

proposed herein is strictly experimental and not sanctioned for

standardized character data interchange.

2. Definitions and Notation

No attempt is made to define all terms used in this document. In

particular, the terminology pertaining to the subject of coded

character systems is not explicitly specified. See [UNICODE],

[ISO10646], and [RFC2130] for additional definitions in this ...