PWG-ANNOUNCE> Character repertoires in printers

PWG-ANNOUNCE> Character repertoires in printers

McDonald, Ira imcdonald at sharplabs.com
Mon Oct 21 18:10:20 EDT 2002


Hi,

Inline below...

Cheers,
- Ira McDonald
  High North Inc

PS - Note that the European standard CEN CWA 13873:2000 standardized
Multilingual European Subsets of ISO 10646/Unicode called MES-1, 
MES-2, MES-3A, and MES-3B (see very end of this note below).

-----Original Message-----
From: Michael Sweet [mailto:mike at easysw.com]
Sent: Monday, October 21, 2002 2:19 PM
To: ElliottBradshaw at oaktech.com
Cc: pwg-announce at pwg.org
Subject: Re: PWG-ANNOUNCE> Character repertoires in printers


ElliottBradshaw at oaktech.com wrote:
 > ...
 > 1.  Is this a problem worth solving?  (vs. vendor-specific solutions)

Yes.

 > 2.  Should it be treated as part of XHTML-Print, UPnP, or some other
 > group? (as opposed to a separate working group)

Probably as part of an existing group.

 > 3.  Who is interested in participarting, as author or reviewer?

I'd be interested, at least in the reviewer/back-seat-driver role. :)

....

Some immediate thoughts based on my own experiences, and without
looking at the Bluetooth docos.

<ira> Bluetooth, rather cleverly, enumerated character repertoires
of Unicode (subsets) by REFERENCE to existing legacy character sets
(see the excerpt from BPP below).
</ira>


1. Aside from the Euro, all printers seem to provide the basic
    Latin characters needed for English and most Western European
    languages.  If you do a language/country-based scheme, it should
    address the presence/absence of the Euro symbol as a separate
    entity. [this doesn't quite sound right to me, but in the context
    of ISO Latin 1 the Euro is a major pain WRT support in printers;
    do with it what you will...]

<ira> Good point about the Euro - ambiguous when you say ISO-8859-1
(Latin-1), because you might well mean that the Euro IS defined at
the canonical location of 0xA4 assigned in ISO-8859-16, rather
than the original (non-specific) CURRENCY SIGN at 0xA4 in Latin-1.
</ira>


2. Providing a list of Unicode ranges may be the simplest way
    of reporting what the device supports, and the client can use
    this to choose embedding/exclusion/error display when the
    user prints something.  This needs to be a per-font resource.

<ira> Since Unicode ranges are assigned well-known names according
to the language/script repertoire, I'd rather we used those names
(and standardized that they reference exactly the ranges assigned
in perpetuity in Unicode 3.2 (current) and above.  For example,
the range U+1000 to U+109F is Myanmar (script used to write
Burmese).
</ira>


3. In addition to or instead of #2, you could define a CSS
    attribute that determines what the device does for characters
    it does not have: exclude (blanks or squares), substitute (from
    another font with the required characters), or error out.

-- 
______________________________________________________________________
Michael Sweet, Easy Software Products                  mike at easysw.com
Printing Software for UNIX                       http://www.easysw.com


----------------------------------------------------------------------
[excerpt from draft Bluetooth Basic Printing Profile v0.95a (5 Oct 2001)]

BLUETOOTH SPECIFICATION

Basic Printing Profile Page 118 of 131

The most current version of the bit assignments for the Character
Repertoires
Supported field may be found in the Host Operating Environment Identifiers
section of
the Bluetooth Assigned Numbers Document [16]. Unassigned bits will be
assigned by
the maintainer of [16] according to procedures described by the Bluetooth
SIG. The
general guideline is that each bit should indicate a subset of the 4 -byte
Unicode
space of use to providers of Senders, Printers, fonts, and Internet content,
with
appropriate support from national standards groups. It is strongly
recommended that
new character repertoires also be filed with IANA (see [36]).

The capability to print 7-bit US-ASCII characters is not listed as part of
the following
table; however, that capability is mandatory for all Printers supporting any
part of this
Profile.

Bit Number Character Repertoire Description

Bit0       ISO-8859-1 Latin alphabet No. 1
Bit1       ISO-8859-2 Latin alphabet No. 2
Bit2       ISO-8859-3 Latin alphabet No. 3
Bit3       ISO-8859-4 Latin alphabet No. 4
Bit4       ISO-8859-5 Latin/Cyrillic alphabet
Bit5       ISO-8859-6 Latin/Arabic alphabet
Bit6       ISO-8859-7 Latin/Greek alphabet
Bit7       ISO-8859-8 Latin/Hebrew alphabet
Bit8       ISO-8859-9 Latin alphabet No. 5
Bit9       ISO-8859-10 Latin alphabet No. 6
Bit10      ISO-8859-13 Latin alphabet No. 7
Bit11      ISO-8859-14 Latin alphabet No. 8
Bit12      ISO-8859-15 Latin alphabet No. 9
Bit13      GB_2312-80 Chinese (People's Republic of China)
Bit14      Shift_JIS Japanese
Bit15      KS_C_5601-1987 Korean
Bit16      Big5 Chinese (Taiwan)
Bit17      TIS-620 Thai
Bits18-127 Reserved (These bits will be allocated by the
           Bluetooth SIG. The Printer should set
           them to zero if not yet allocated, or if
           relevant character repertoire is not
           supported.)

Table 43: Character Repertoires Supported

-------------------------------------------------------
[excerpt from CEN CWA 13873:2000]

Annex B. List of languages covered by MES-1 (Informative)

The Multilingual European Subset No 1 is believed to cover at least 
the languages listed here:

Afrikaans
Albanian
Basque
Breton
Catalan
Croatian
Czech
Danish
Dutch
English
Esperanto
Estonian
Faroese
Finnish
French
Frisian
Galician
German
Greenlandic
Hungarian
Icelandic
Irish Gaelic (new orthogra-phy)
Italian
Latvian
Lithuanian
Luxemburgish
Maltese
Manx Gaelic
Moldavian (new orthogra-phy,
with restrictions; has Þ
ß â ãthough Y Z i lare pre-ferred)
Northern Sámi
Norwegian
Occitan
Polish
Portuguese
Rhaeto-Romanic
Romanian (with restrictions;
has Þ ß â ãthough Y Z i lare
preferred)
Scottish Gaelic
Slovak
Slovenian
Lower Sorbian
Upper Sorbian
Spanish
Swedish
Turkish
Welsh (with restrictions;
only W ^ w^ Y ´ y´ Y ^ y^ Y and ÿ)

Annex C. List of languages covered by MES-2 (Informative)

In addition to the languages listed in annex B, the Multilingual 
European Subset No. 2 is believed to cover at least the languages 
listed in C.1-C.3.

C.1 Latin script
Arumanian
Asturian
Azerbaijani (new orthogra-phy)
Cornish
Friulian
Inari Sámi
Irish Gaelic (old and new
orthographies)
Istro-Romanian
Karelian
Kashubian
Ladin
Latin
Lule Sámi
Megleno-Romanian
Northern Sámi
Romani
Romanian
Skolt Sámi
Southern Sámi
Vepsian
Votic
Welsh

C.2 Greek script
Greek

C.3 Cyrillic script
Abaza
Abkhaz
Adyge
Altai
Avar
Azerbaijani (old orthogra-phy)
Balkar
Bashkir
Belarussian
Bulgarian
Buryat
Chechen
Chukchi
Chuvash
Crimean Tatar
Dargwa
Dungan
Even
Evenki
Gagauz
Hill Mari
Ingush
Kabardian
Kalmuk
Kalmyk
Karaim
Karakalpak
Kazakh
Khakas
Khanty
Komi
Komi-Permyak
Koryak
Kumyk
Kyrgyz
Lak
Lezgian
Mansi
Meadow Mari
Moksha
Moldavian (old orthography)
Nanai
Nenets
Nogai
Ossetian
Romani
Russian
Rutul
Serbian
Siberian Yupik
Slavic Macedonian
Tabasaran
Tajik
Tatar
Tati
Türkmen
Tuva
Udmurt
Uighur
Ukrainian
Uzbek
Yakut




More information about the Pwg-announce mailing list