Character Repertories Mail Archive: RE: CR> W3C Character Mo

RE: CR> W3C Character Model and Early Uniform Normalization

From: McDonald, Ira (imcdonald@sharplabs.com)
Date: Fri Sep 19 2003 - 13:45:25 EDT

Next message: BIGELOW,JIM (HP-Boise,ex1): "RE: CR> W3C Character Model and Early Uniform Normalization"

Previous message: elliott.bradshaw@zoran.com: "Re: CR> W3C Character Model and Early Uniform Normalization"
Maybe in reply to: BIGELOW,JIM (HP-Boise,ex1): "CR> W3C Character Model and Early Uniform Normalization"
Next in thread: BIGELOW,JIM (HP-Boise,ex1): "RE: CR> W3C Character Model and Early Uniform Normalization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

My two cents:

(1) [answering Elliot]
    Unicode normalization has no impact at all on the CR specs -
    - they merely refer to character repertoires (often including
    both composed and uncomposed characters) which are defined
    (in _all_ cases) by some other standards body (Unicode, ISO,
    IANA, etc.).

(2) [answering Jim]
    No - a printer should _never_ throw away any document data
    that happens not to be normalized (it is actually very
    difficult to determine if that data is already in Unicode
    NFC or NFKC, except by doing the whole normalization and
    then doing binary compare of the results with original).

(3) [answering Jim]
No - a printer should _never_ trust the sender/generator
to have properly normalized Unicode data.

(4) [my own comment]
    Early Uniform Normalization is important and useful for
    _very_ small pieces of data and _narrow_ fields of
    application (such as IETF's I18N Domain Names standards).
    The day will never come that receivers need not check
    for (or simply perform) normalization, if needed. Some
    rendering algorithms happen to require that Unicode data
    be pre-normalized, but that's an implementation nit.

Cheers,
- Ira McDonald
High North Inc

-----Original Message-----
From: elliott.bradshaw@zoran.com [mailto:elliott.bradshaw@zoran.com]
Sent: Friday, September 19, 2003 10:23 AM
To: BIGELOW,JIM (HP-Boise,ex1)
Cc: 'cr@pwg.org'; owner-cr@pwg.org
Subject: Re: CR> W3C Character Model and Early Uniform Normalization

What are the XHTML-Print operations that are affacted by normalization?
This discussion is useful for string processing (match, substring, sort)
but I don't see how that affects printing. One possible area is CSS class
names; are they restricted to ASCII?

Also, I don't see how a new report can change the definition of an existing
spec (XHTML). Isn't this a separate set of rules that might be folded into
future revisions?

I would rather see a use-case that makes sense for XHTML-Print before
adding this in.

P.S. Does it have any effect on current CR documents? I don't think so.
There is no discussion of combining in there at all.

----------------------------------------------------------------------------

----

Elliott Bradshaw Director, Software Engineering Zoran Imaging Group (formerly Oak Technology Imaging Group) 781 638-7534

"BIGELOW,JIM

(HP-Boise,ex1) To: "'cr@pwg.org'" <cr@pwg.org>

" cc:

<jim.bigelow@h Subject: CR> W3C Character Model and Early p.com> Uniform Normalization

Sent by:

owner-cr@pwg.o

09/18/2003

08:01 PM

Hello,

I've been reading the W3C Working Draft, Character Model for the World Wide Web [1], which deals with requires of internet applications should as producers and consumers of XHTML-Print.

This report [1] indicates that XHTML-Print as a derivate of XHTML is bound by it. Therefore, by extension, all XHTML-Print producing and consuming applications are bound by this report all thought this is never explicitly stated in any version of the XHTML-Print specification [2,3].

One of the interesting parts of [1] is the requirement that applications that produce XHTML-Print should produce fully-normalized text [4] meaning, among other things, that it is in Unicode Normalized Form C [5], which favors the canonical composite forms of Unicode characters.

From the printer's perspective, as a receiver of XHTML-Print documents, this makes its job easier since it can always assume that text is fully-normalized and it doesn't have to do so itself.

My question to you is, do you think that the XHTML-Print specification should be amended to site the requirement that a conforming XHTML-Print document be fully-normalized? Furthermore, should a printer be required to check an XHTML-Print document to see that it is fully-normalized or should it assume so? Lastly, should a printer normalize text that is not fully-normalized or discard it?

Jim

-- Jim Bigelow, Editor: XHTML-Print & CSS Print Profile Member: W3C HTML and CSS Working Groups Hewlett-Packard 208-396-2068 jim.bigelow@hp.com

[1] http://www.w3.org/TR/charmod/ [2] http://www.pwg.org/xhtml-print/HTML-Version/XHTML-Print.html [3] http://www.w3.org/TR/xhtml-print/ [4] http://www.w3.org/TR/2003/WD-charmod-20030822/#sec-FullyNormalized [5] http://www.unicode.org/unicode/reports/tr15/#Specification

Next message: BIGELOW,JIM (HP-Boise,ex1): "RE: CR> W3C Character Model and Early Uniform Normalization"
Previous message: elliott.bradshaw@zoran.com: "Re: CR> W3C Character Model and Early Uniform Normalization"
Maybe in reply to: BIGELOW,JIM (HP-Boise,ex1): "CR> W3C Character Model and Early Uniform Normalization"
Next in thread: BIGELOW,JIM (HP-Boise,ex1): "RE: CR> W3C Character Model and Early Uniform Normalization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Fri Sep 19 2003 - 13:46:46 EDT