Character Repertories Mail Archive: CR> RE: PWG-ANNOUNCE>

CR> RE: PWG-ANNOUNCE> Character Repertoires Charter and Last Call

From: McDonald, Ira (imcdonald@sharplabs.com)
Date: Sun Jun 01 2003 - 19:45:54 EDT

  • Next message: McDonald, Ira: "CR> Charset terminology"

    [I took pwg-announce off the cc: in this reply - added CR list]

    Hi,

    I agree with your suggestion that we should be using 'charset'
    (in the IETF/IANA sense) for a 'coded character set' (such as
    Unicode 4.0) in a 'character encoding scheme' (such as UTF-8).

    That would also be consistent with the usage in IPP/1.1 (RFC
    2911), where the base datatype 'charset' is defined (on page 86)
    for the IPP Printer attributes 'charset-configured' and
    'charset-supported'.

    Also, the CR charter and eventual standard should have a reference
    to the W3C Character Model.

    Cheers,
    - Ira McDonald

    -----Original Message-----
    From: Jun Fujisawa [mailto:fujisawa.jun@canon.co.jp]
    Sent: Saturday, May 31, 2003 7:12 PM
    To: ElliottBradshaw@oaktech.com
    Cc: pwg-announce@pwg.org
    Subject: Re: PWG-ANNOUNCE> Character Repertoires Charter and Last Call

    Hello Elliott,

    At 5:20 PM -0400 03.5.29, ElliottBradshaw@oaktech.com wrote:
    >A Charter has been reviewed within the CR group and there are no open
    >issues.
    >
    >It is available online at
    >ftp://ftp.pwg.org/pub/pwg/cr/charter/ch-cr10-20030507.html.
    >
    >So today I begin a 10-day Last Call for comments on this document, prior to
    >a formal vote by the PWG.

    I feel a little uncomfortable with the following paragraph in the Charter.

    >In Unicode and W3C specifications, the term "character set" usually
    >refers to a method of encoding a (possibly very large) set of characters,
    >e.g. UTF-8. This tells how to encode a given character if it is present,
    >but doesn't define which characters in that space are actually in use.

    In the Character Model for the World Wide Web specification, W3C
    clearly deny the use of the term "character set" to refer to a method
    of encoding.

    <http://www.w3.org/TR/charmod/>

    >[S] Specifications SHOULD avoid using the terms 'character set' and
    >'charset' to refer to a character encoding, except when the latter is used
    >to refer to the MIME charset parameter or its IANA-registered values.
    >The terms 'character encoding', 'character encoding form' or 'character
    >encoding scheme' are RECOMMENDED.

    I suggest to change the wording to something like the following.

    In Unicode and W3C specifications, the term "character set" usually
    refers to a (possibly very large) set of characters, e.g. ISO/IEC 10646.
    The term "character set", however, can be confusing in some cases,
    since the similar term "charset" is used as a MIME parameter, which
    refers to the combination of "coded character set" and "character
    encoding scheme", not just the former.

    --
    Jun Fujisawa
    <mailto:fujisawa.jun@canon.co.jp>
    



    This archive was generated by hypermail 2b29 : Sun Jun 01 2003 - 19:46:46 EDT