Character Repertories Mail Archive: CR> RE: Value matching i

CR> RE: Value matching in CR

From: McDonald, Ira (imcdonald@sharplabs.com)
Date: Sat Mar 22 2003 - 18:50:10 EST

Next message: ElliottBradshaw@oaktech.com: "CR> New CR document: Standard for Character Repertoire Interoperabiliy"

Previous message: Harry Lewis: "Re: CR> PWG> PWG IEEE-ISTO number for Proposed XHTML/Print standard"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Elliot,

All existing UNIX implementations of POSIX locales do locale name
matching (language/charset concatenations) based on the rules at
(1) below. But POSIX itself does not formalize this matching
rule (anywhere I've been able to find so far).

(1) Only for purposes of comparing two character repertoire names,
    Printers (or Clients) MUST:
    (a) convert all letters to lowercase;
    (b) remove all hyphens, underscores, and periods; and
    (c) truncate semi-colons (year of standard version separators)
        and any trailing date info

Although the character set with the common alias "Latin 1" has been
registered with a 'Name:' of "ISO_8859-1:1987" in the IANA Charset
Registry, it is also VERY commonly referred to by existing software
as "iso8859-1" or "iso-8859-1" or "iso_8859.1" (notice the typical
misuse of periods and inconsistent presence of hyphen after "iso").

It is highly desirable that IPP/PSI Printers/Clients behave like
Web search engines and accept all approximate matches as equal.

(2) For purposes of displaying supported character repertoires in
    the future "repertoire-supported" Printer object attribute,
    Printers MUST:
    (a) use a 'namespace' prefix from the PWG CR standard (such
        as "unihan") in all lowercase, followed by a hyphen;
    (b) use the best practice name of the base charset - for the
        "iana" prefix, this MUST be the registered 'Name:' value
        (complete with the year of standard suffix after a colon)
        and MUST NOT be any registered 'Alias:' value. However,
        this value MUST be normalized to lowercase, consistent
        with the existing 'charset-supported' Printer attribute
        semantics. And any imbedded underscores MUST be changed
        to hyphens for consistency.

I'd like to say it's OK to retain the colon/date info for the
comparisons, but it's really not safe, practically speaking.

Note that the existing "charset-supported" attribute says that
Printers MUST use the 'Name:' value and MUST NOT use any of the
'Alias:' values from the IANA Charset Registry.

An interesting sidelight: The Printer MIB (RFC 1759) uses the enum
tags that are 'Alias:' values beginning with "cs" (and containing
NO punctuation characters at all, as recommended by SMIv2 for MIBs).
When the Printer MIB is "visible" through the future PWG WBMM
interface (and the new Printer Device in the PWG Semantic Model),
we'll be faced with another interesting name collision. Sigh...

Cheers,
- Ira McDonald
High North Inc

-----Original Message-----
From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com]
Sent: Friday, March 21, 2003 11:50 AM
To: McDonald, Ira
Subject: Value matching in CR

Hi Ira,

I've been fiddling with the rules for matching CR values...in the last
version I said that hyphens and underscores would be dropped before
comparison. This may be a bit drastic...what if we say that a hyphen
matches an underscore?

Also, I think you said there was some reference would could use on the
subject. True?

Thanks,
E.

------------------------------------------
Elliott Bradshaw
Director, Software Engineering
Oak Technology Imaging Group
781 638-7534

Next message: ElliottBradshaw@oaktech.com: "CR> New CR document: Standard for Character Repertoire Interoperabiliy"
Previous message: Harry Lewis: "Re: CR> PWG> PWG IEEE-ISTO number for Proposed XHTML/Print standard"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sat Mar 22 2003 - 18:50:39 EST