RE: PWG-ANNOUNCE> Character repertoires in printers

From: don@lexmark.com
Date: Tue Oct 22 2002 - 09:58:36 EDT


Please move this discussion to pwg@pwg.org and not the pwg-announce list.

**********************************************
 Don Wright don@lexmark.com

 Member, IEEE SA Standards Board
         PatCom Chair, SCC Liaison
 Member, IEEE-ISTO Board of Directors
 f.wright@ieee.org / f.wright@computer.org

 Director, Alliances & Standards
 Lexmark International
 740 New Circle Rd
 Lexington, Ky 40550
 859-825-4808 (phone) 603-963-8352 (fax)
**********************************************

"McDonald, Ira" <imcdonald@sharplabs.com>@pwg.org on 10/21/2002 06:10:20 PM

Sent by: owner-pwg-announce@pwg.org

To: "'Michael Sweet'" <mike@easysw.com>, ElliottBradshaw@oaktech.com
cc: pwg-announce@pwg.org
Subject: RE: PWG-ANNOUNCE> Character repertoires in printers

Hi,

Inline below...

Cheers,
- Ira McDonald
  High North Inc

PS - Note that the European standard CEN CWA 13873:2000 standardized
Multilingual European Subsets of ISO 10646/Unicode called MES-1,
MES-2, MES-3A, and MES-3B (see very end of this note below).

-----Original Message-----
From: Michael Sweet [mailto:mike@easysw.com]
Sent: Monday, October 21, 2002 2:19 PM
To: ElliottBradshaw@oaktech.com
Cc: pwg-announce@pwg.org
Subject: Re: PWG-ANNOUNCE> Character repertoires in printers

ElliottBradshaw@oaktech.com wrote:
> ...
> 1. Is this a problem worth solving? (vs. vendor-specific solutions)

Yes.

> 2. Should it be treated as part of XHTML-Print, UPnP, or some other
> group? (as opposed to a separate working group)

Probably as part of an existing group.

> 3. Who is interested in participarting, as author or reviewer?

I'd be interested, at least in the reviewer/back-seat-driver role. :)

....

Some immediate thoughts based on my own experiences, and without
looking at the Bluetooth docos.

<ira> Bluetooth, rather cleverly, enumerated character repertoires
of Unicode (subsets) by REFERENCE to existing legacy character sets
(see the excerpt from BPP below).
</ira>

1. Aside from the Euro, all printers seem to provide the basic
    Latin characters needed for English and most Western European
    languages. If you do a language/country-based scheme, it should
    address the presence/absence of the Euro symbol as a separate
    entity. [this doesn't quite sound right to me, but in the context
    of ISO Latin 1 the Euro is a major pain WRT support in printers;
    do with it what you will...]

<ira> Good point about the Euro - ambiguous when you say ISO-8859-1
(Latin-1), because you might well mean that the Euro IS defined at
the canonical location of 0xA4 assigned in ISO-8859-16, rather
than the original (non-specific) CURRENCY SIGN at 0xA4 in Latin-1.
</ira>

2. Providing a list of Unicode ranges may be the simplest way
    of reporting what the device supports, and the client can use
    this to choose embedding/exclusion/error display when the
    user prints something. This needs to be a per-font resource.

<ira> Since Unicode ranges are assigned well-known names according
to the language/script repertoire, I'd rather we used those names
(and standardized that they reference exactly the ranges assigned
in perpetuity in Unicode 3.2 (current) and above. For example,
the range U+1000 to U+109F is Myanmar (script used to write
Burmese).
</ira>

3. In addition to or instead of #2, you could define a CSS
    attribute that determines what the device does for characters
    it does not have: exclude (blanks or squares), substitute (from
    another font with the required characters), or error out.

--
______________________________________________________________________
Michael Sweet, Easy Software Products                  mike@easysw.com
Printing Software for UNIX                       http://www.easysw.com

---------------------------------------------------------------------- [excerpt from draft Bluetooth Basic Printing Profile v0.95a (5 Oct 2001)]

BLUETOOTH SPECIFICATION

Basic Printing Profile Page 118 of 131

The most current version of the bit assignments for the Character Repertoires Supported field may be found in the Host Operating Environment Identifiers section of the Bluetooth Assigned Numbers Document [16]. Unassigned bits will be assigned by the maintainer of [16] according to procedures described by the Bluetooth SIG. The general guideline is that each bit should indicate a subset of the 4 -byte Unicode space of use to providers of Senders, Printers, fonts, and Internet content, with appropriate support from national standards groups. It is strongly recommended that new character repertoires also be filed with IANA (see [36]).

The capability to print 7-bit US-ASCII characters is not listed as part of the following table; however, that capability is mandatory for all Printers supporting any part of this Profile.

Bit Number Character Repertoire Description

Bit0 ISO-8859-1 Latin alphabet No. 1 Bit1 ISO-8859-2 Latin alphabet No. 2 Bit2 ISO-8859-3 Latin alphabet No. 3 Bit3 ISO-8859-4 Latin alphabet No. 4 Bit4 ISO-8859-5 Latin/Cyrillic alphabet Bit5 ISO-8859-6 Latin/Arabic alphabet Bit6 ISO-8859-7 Latin/Greek alphabet Bit7 ISO-8859-8 Latin/Hebrew alphabet Bit8 ISO-8859-9 Latin alphabet No. 5 Bit9 ISO-8859-10 Latin alphabet No. 6 Bit10 ISO-8859-13 Latin alphabet No. 7 Bit11 ISO-8859-14 Latin alphabet No. 8 Bit12 ISO-8859-15 Latin alphabet No. 9 Bit13 GB_2312-80 Chinese (People's Republic of China) Bit14 Shift_JIS Japanese Bit15 KS_C_5601-1987 Korean Bit16 Big5 Chinese (Taiwan) Bit17 TIS-620 Thai Bits18-127 Reserved (These bits will be allocated by the Bluetooth SIG. The Printer should set them to zero if not yet allocated, or if relevant character repertoire is not supported.)

Table 43: Character Repertoires Supported

------------------------------------------------------- [excerpt from CEN CWA 13873:2000]

Annex B. List of languages covered by MES-1 (Informative)

The Multilingual European Subset No 1 is believed to cover at least the languages listed here:

Afrikaans Albanian Basque Breton Catalan Croatian Czech Danish Dutch English Esperanto Estonian Faroese Finnish French Frisian Galician German Greenlandic Hungarian Icelandic Irish Gaelic (new orthogra-phy) Italian Latvian Lithuanian Luxemburgish Maltese Manx Gaelic Moldavian (new orthogra-phy, with restrictions; has Ţ ß â ăthough Y Z i lare pre-ferred) Northern Sámi Norwegian Occitan Polish Portuguese Rhaeto-Romanic Romanian (with restrictions; has Ţ ß â ăthough Y Z i lare preferred) Scottish Gaelic Slovak Slovenian Lower Sorbian Upper Sorbian Spanish Swedish Turkish Welsh (with restrictions; only W ^ w^ Y ´ y´ Y ^ y^ Y and ˙)

Annex C. List of languages covered by MES-2 (Informative)

In addition to the languages listed in annex B, the Multilingual European Subset No. 2 is believed to cover at least the languages listed in C.1-C.3.

C.1 Latin script Arumanian Asturian Azerbaijani (new orthogra-phy) Cornish Friulian Inari Sámi Irish Gaelic (old and new orthographies) Istro-Romanian Karelian Kashubian Ladin Latin Lule Sámi Megleno-Romanian Northern Sámi Romani Romanian Skolt Sámi Southern Sámi Vepsian Votic Welsh

C.2 Greek script Greek

C.3 Cyrillic script Abaza Abkhaz Adyge Altai Avar Azerbaijani (old orthogra-phy) Balkar Bashkir Belarussian Bulgarian Buryat Chechen Chukchi Chuvash Crimean Tatar Dargwa Dungan Even Evenki Gagauz Hill Mari Ingush Kabardian Kalmuk Kalmyk Karaim Karakalpak Kazakh Khakas Khanty Komi Komi-Permyak Koryak Kumyk Kyrgyz Lak Lezgian Mansi Meadow Mari Moksha Moldavian (old orthography) Nanai Nenets Nogai Ossetian Romani Russian Rutul Serbian Siberian Yupik Slavic Macedonian Tabasaran Tajik Tatar Tati Türkmen Tuva Udmurt Uighur Ukrainian Uzbek Yakut



This archive was generated by hypermail 2.1.4 : Thu Apr 16 2009 - 10:55:40 EDT