IPP> RFC 2482 - Language Tagging in Unicode Plain Text

IPP> RFC 2482 - Language Tagging in Unicode Plain Text

Ira McDonald imcdonal at sdsp.mc.xerox.com
Fri Jan 15 11:42:14 EST 1999


Hi folks,

This brand new RFC contains a very important extension to the
Unicode character set (in Plane 14) to permit SAFE language
tags to be imbedded in Unicode plain text streams.  This is
of potential interest in many standards domains.

Read and enjoy,
- Ira McDonald (outside consultant at Xerox)
  High North Inc
  716-461-5667 

-----------------------------------
[excerpt from 'ftp://ftp.isi.edu/in-notes/rfc2482.txt']







Network Working Group                                       K. Whistler
Request for Comments: 2482                                       Sybase
Category: Informational                                        G. Adams
                                                               Spyglass
                                                           January 1999


                 Language Tagging in Unicode Plain Text

Status of this Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard of any kind.  Distribution of this
   memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (1999).  All Rights Reserved.

IESG Note:

   This document has been accepted by ISO/IEC JTC1/SC2/WG2 in meeting
   #34 to be submitted as a recommendation from WG2 for inclusion in
   Plane 14 in part 2 of ISO/IEC 10646.

1.  Abstract

   This document proposed a mechanism for language tagging in [UNICODE]
   plain text. A set of special-use tag characters on Plane 14 of
   [ISO10646] (accessible through UTF-8, UTF-16, and UCS-4 encoding
   forms) are proposed for encoding to enable the spelling out of
   ASCII-based string tags using characters which can be strictly
   separated from ordinary text content characters in ISO10646 (or
   UNICODE).

   One tag identification character and one cancel tag character are
   also proposed. In particular, a language tag identification character
   is proposed to identify a language tag string specifically; the
   language tag itself makes use of [RFC1766] language tag strings
   spelled out using the Plane 14 tag characters. Provision of a
   specific, low-overhead mechanism for embedding language tags in plain
   text is aimed at meeting the need of Internet Protocols such as ACAP,
   which require a standard mechanism for marking language in UTF-8
   strings.

   The tagging mechanism as well the characters proposed in this
   document have been approved by the Unicode Consortium for inclusion
   in The Unicode Standard.  However, implementation of this decision



Whistler & Adams             Informational                      [Page 1]

RFC 2482         Language Tagging in Unicode Plain Text     January 1999


   awaits formal acceptance by ISO JTC1/SC2/WG2, the working group
   responsible for ISO10646. Potential implementers should be aware that
   until this formal acceptance occurs, any usage of the characters
   proposed herein is strictly experimental and not sanctioned for
   standardized character data interchange.

-----------------------------------




More information about the Ipp mailing list