Character Repertories Mail Archive: RE: CR> W3C Character Mo

RE: CR> W3C Character Model and Early Uniform Normalization

From: McDonald, Ira (imcdonald@sharplabs.com)
Date: Fri Sep 19 2003 - 13:45:25 EDT

  • Next message: BIGELOW,JIM (HP-Boise,ex1): "RE: CR> W3C Character Model and Early Uniform Normalization"

    Hi,

    My two cents:

    (1) [answering Elliot]
        Unicode normalization has no impact at all on the CR specs -
        - they merely refer to character repertoires (often including
        both composed and uncomposed characters) which are defined
        (in _all_ cases) by some other standards body (Unicode, ISO,
        IANA, etc.).

    (2) [answering Jim]
        No - a printer should _never_ throw away any document data
        that happens not to be normalized (it is actually very
        difficult to determine if that data is already in Unicode
        NFC or NFKC, except by doing the whole normalization and
        then doing binary compare of the results with original).

    (3) [answering Jim]
        No - a printer should _never_ trust the sender/generator
        to have properly normalized Unicode data.

    (4) [my own comment]
        Early Uniform Normalization is important and useful for
        _very_ small pieces of data and _narrow_ fields of
        application (such as IETF's I18N Domain Names standards).
        The day will never come that receivers need not check
        for (or simply perform) normalization, if needed. Some
        rendering algorithms happen to require that Unicode data
        be pre-normalized, but that's an implementation nit.

    Cheers,
    - Ira McDonald
      High North Inc

    -----Original Message-----
    From: elliott.bradshaw@zoran.com [mailto:elliott.bradshaw@zoran.com]
    Sent: Friday, September 19, 2003 10:23 AM
    To: BIGELOW,JIM (HP-Boise,ex1)
    Cc: 'cr@pwg.org'; owner-cr@pwg.org
    Subject: Re: CR> W3C Character Model and Early Uniform Normalization

    What are the XHTML-Print operations that are affacted by normalization?
    This discussion is useful for string processing (match, substring, sort)
    but I don't see how that affects printing. One possible area is CSS class
    names; are they restricted to ASCII?

    Also, I don't see how a new report can change the definition of an existing
    spec (XHTML). Isn't this a separate set of rules that might be folded into
    future revisions?

    I would rather see a use-case that makes sense for XHTML-Print before
    adding this in.

      E.

    P.S. Does it have any effect on current CR documents? I don't think so.
    There is no discussion of combining in there at all.

    ----------------------------------------------------------------------------

    ----
    

    Elliott Bradshaw Director, Software Engineering Zoran Imaging Group (formerly Oak Technology Imaging Group) 781 638-7534

    "BIGELOW,JIM

    (HP-Boise,ex1) To: "'cr@pwg.org'" <cr@pwg.org>

    " cc:

    <jim.bigelow@h Subject: CR> W3C Character Model and Early p.com> Uniform Normalization

    Sent by:

    owner-cr@pwg.o

    rg

    09/18/2003

    08:01 PM

    Hello,

    I've been reading the W3C Working Draft, Character Model for the World Wide Web [1], which deals with requires of internet applications should as producers and consumers of XHTML-Print.

    This report [1] indicates that XHTML-Print as a derivate of XHTML is bound by it. Therefore, by extension, all XHTML-Print producing and consuming applications are bound by this report all thought this is never explicitly stated in any version of the XHTML-Print specification [2,3].

    One of the interesting parts of [1] is the requirement that applications that produce XHTML-Print should produce fully-normalized text [4] meaning, among other things, that it is in Unicode Normalized Form C [5], which favors the canonical composite forms of Unicode characters.

    From the printer's perspective, as a receiver of XHTML-Print documents, this makes its job easier since it can always assume that text is fully-normalized and it doesn't have to do so itself.

    My question to you is, do you think that the XHTML-Print specification should be amended to site the requirement that a conforming XHTML-Print document be fully-normalized? Furthermore, should a printer be required to check an XHTML-Print document to see that it is fully-normalized or should it assume so? Lastly, should a printer normalize text that is not fully-normalized or discard it?

    Jim

    -- Jim Bigelow, Editor: XHTML-Print & CSS Print Profile Member: W3C HTML and CSS Working Groups Hewlett-Packard 208-396-2068 jim.bigelow@hp.com

    [1] http://www.w3.org/TR/charmod/ [2] http://www.pwg.org/xhtml-print/HTML-Version/XHTML-Print.html [3] http://www.w3.org/TR/xhtml-print/ [4] http://www.w3.org/TR/2003/WD-charmod-20030822/#sec-FullyNormalized [5] http://www.unicode.org/unicode/reports/tr15/#Specification



    This archive was generated by hypermail 2b29 : Fri Sep 19 2003 - 13:46:46 EDT