Character Repertories Mail Archive: CR> RE: PWG-ANNOUNCE>

CR> RE: PWG-ANNOUNCE> Character Repertoires Charter and Last Call

From: McDonald, Ira (imcdonald@sharplabs.com)
Date: Sun Jun 01 2003 - 19:45:54 EDT

Next message: McDonald, Ira: "CR> Charset terminology"

Previous message: McDonald, Ira: "CR> PWG SM bindings for new RepertoireSupported element"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[I took pwg-announce off the cc: in this reply - added CR list]

Hi,

I agree with your suggestion that we should be using 'charset'
(in the IETF/IANA sense) for a 'coded character set' (such as
Unicode 4.0) in a 'character encoding scheme' (such as UTF-8).

That would also be consistent with the usage in IPP/1.1 (RFC
2911), where the base datatype 'charset' is defined (on page 86)
for the IPP Printer attributes 'charset-configured' and
'charset-supported'.

Also, the CR charter and eventual standard should have a reference
to the W3C Character Model.

Cheers,
- Ira McDonald

-----Original Message-----
From: Jun Fujisawa [mailto:fujisawa.jun@canon.co.jp]
Sent: Saturday, May 31, 2003 7:12 PM
To: ElliottBradshaw@oaktech.com
Cc: pwg-announce@pwg.org
Subject: Re: PWG-ANNOUNCE> Character Repertoires Charter and Last Call

Hello Elliott,

At 5:20 PM -0400 03.5.29, ElliottBradshaw@oaktech.com wrote:
>A Charter has been reviewed within the CR group and there are no open
>issues.
>
>It is available online at
>ftp://ftp.pwg.org/pub/pwg/cr/charter/ch-cr10-20030507.html.
>
>So today I begin a 10-day Last Call for comments on this document, prior to
>a formal vote by the PWG.

I feel a little uncomfortable with the following paragraph in the Charter.

>In Unicode and W3C specifications, the term "character set" usually
>refers to a method of encoding a (possibly very large) set of characters,
>e.g. UTF-8. This tells how to encode a given character if it is present,
>but doesn't define which characters in that space are actually in use.

In the Character Model for the World Wide Web specification, W3C
clearly deny the use of the term "character set" to refer to a method
of encoding.

<http://www.w3.org/TR/charmod/>

>[S]$B!!(JSpecifications SHOULD avoid using the terms 'character set' and
>'charset' to refer to a character encoding, except when the latter is used
>to refer to the MIME charset parameter or its IANA-registered values.
>The terms 'character encoding', 'character encoding form' or 'character
>encoding scheme' are RECOMMENDED.

I suggest to change the wording to something like the following.

In Unicode and W3C specifications, the term "character set" usually
refers to a (possibly very large) set of characters, e.g. ISO/IEC 10646.
The term "character set", however, can be confusing in some cases,
since the similar term "charset" is used as a MIME parameter, which
refers to the combination of "coded character set" and "character
encoding scheme", not just the former.

--
Jun Fujisawa
<mailto:fujisawa.jun@canon.co.jp>

Next message: McDonald, Ira: "CR> Charset terminology"
Previous message: McDonald, Ira: "CR> PWG SM bindings for new RepertoireSupported element"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sun Jun 01 2003 - 19:46:46 EDT