IPP> RFC 2482 - Language Tagging in Unicode Plain Text

Ira McDonald (imcdonal@sdsp.mc.xerox.com)
Fri, 15 Jan 99 11:42:14 EST

Hi folks,

This brand new RFC contains a very important extension to the
Unicode character set (in Plane 14) to permit SAFE language
tags to be imbedded in Unicode plain text streams. This is
of potential interest in many standards domains.

Read and enjoy,
- Ira McDonald (outside consultant at Xerox)
High North Inc

[excerpt from 'ftp://ftp.isi.edu/in-notes/rfc2482.txt']

Network Working Group K. Whistler
Request for Comments: 2482 Sybase
Category: Informational G. Adams
January 1999

Language Tagging in Unicode Plain Text

Status of this Memo

This memo provides information for the Internet community. It does
not specify an Internet standard of any kind. Distribution of this
memo is unlimited.

Copyright Notice

Copyright (C) The Internet Society (1999). All Rights Reserved.

IESG Note:

This document has been accepted by ISO/IEC JTC1/SC2/WG2 in meeting
#34 to be submitted as a recommendation from WG2 for inclusion in
Plane 14 in part 2 of ISO/IEC 10646.

1. Abstract

This document proposed a mechanism for language tagging in [UNICODE]
plain text. A set of special-use tag characters on Plane 14 of
[ISO10646] (accessible through UTF-8, UTF-16, and UCS-4 encoding
forms) are proposed for encoding to enable the spelling out of
ASCII-based string tags using characters which can be strictly
separated from ordinary text content characters in ISO10646 (or

One tag identification character and one cancel tag character are
also proposed. In particular, a language tag identification character
is proposed to identify a language tag string specifically; the
language tag itself makes use of [RFC1766] language tag strings
spelled out using the Plane 14 tag characters. Provision of a
specific, low-overhead mechanism for embedding language tags in plain
text is aimed at meeting the need of Internet Protocols such as ACAP,
which require a standard mechanism for marking language in UTF-8

The tagging mechanism as well the characters proposed in this
document have been approved by the Unicode Consortium for inclusion
in The Unicode Standard. However, implementation of this decision

awaits formal acceptance by ISO JTC1/SC2/WG2, the working group
responsible for ISO10646. Potential implementers should be aware that
until this formal acceptance occurs, any usage of the characters
proposed herein is strictly experimental and not sanctioned for
standardized character data interchange.