PMP Mail Archive: PMP> Explanation of Alt 4: how ONE settable char set works for

PMP> Explanation of Alt 4: how ONE settable char set works for

Tom Hastings (hastings@cp10.es.xerox.com)
Thu, 24 Jul 1997 15:56:08 PDT

This mail note is to explain how a Printer implementation can supply
vendor information and allow the System Administrator to supply
site-dependent information in a single coded character set that is
selected at install time. I'm am trying to help us all understand how
alternative 4 (the SYNTHESIS proposal) would be used.

The SYNTHESIS proposal allows using UTF-8 (and encourages it as the default
by using DEVFAL), but does NOT require it. Thus the proposal follows the
recommendation of the IAB in RFC 2130:

"This report recommends the use of ISO 10646 as the default Coded
Character Set, and UTF-8 as the default Character Encoding Scheme in
the creation of new protocols or new version of old protocols which
transmit text. These defaults do not deprecate the use of other
character sets when and where they are needed; they are simply
intended to provide guidance and a specification for
interoperability."

The scenario of how the new object prtGeneralStaticCodeSet would
be used (as a read-only object) is that the vendor ships a floppy with his
printer (most printer vendors already do this). The System Administrator
runs a (local) install application that allows him to select which
representation for the vendor supplied information to include and the install
application puts that information into the flash memory of the printer.
That vendor information includes such objects as: prtGeneralSerialNumber,
prtInputName, prtInputVendorName, prtInputModel, prtInputVersion,
prtInputSerialNumber, etc. Any information that is always in English,
could be burned into the ROM.

The System Administrator also decides at the same time which site-settable
objects, such as prtGeneralPrinterName, prtGeneralCurrentOperator,
prtGeneralServicePerson, etc. and sets that information also into
flash memory of the printer.

All these objects can be implemented as READ-ONLY in the MIB and all are
stored using the SAME coded character set that is also stored in the
new prtGeneralStatisCharSet object.

The enum that is stored to indicate the coded character set (and encoding
scheme) is from the IANA registry. See:

ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets

Only if there is some sort of security mechansm in place should an implementor
(or the system administrator) consider making these object READ-WRITE.
In such a case, the application MUST make sure to read the value of
prtGeneralStaticCodeSet and then write the coded character set data
in that coded character set. If the SA has picked a ubiquitous set,
that is likely to be the set that the application is using.

For the SYNTHESIS proposal, the SA chooses one char set for all the
information, whether it comes from the vendor or is site-dependent.
A printer implementations could support some or all of the following
character sets, depending on its market:

Market Coded Character Set (IANA enum value)

US US-ASCII (3)

Western Hemisphere/ ISO 8859-1 (4), HP Roman8 (2004), Code page 850 (2009)
Western Europe

World UTF-8 (106), US-ASCII/JIS X0208 (63),
US-ASCII/GB2312 (2025)

Also the vendor might chose to only put English on his floppy, or could
have different versions for each language on the floppy. But once in the
MIB, there is only one coded charater set as selected by the System
Administrator (hopefully in some user-friendly way, such as the SA
choosing his environment, rather than choosing an actual coded character
set).

The point is that any one of the above character sets cover multiple
languages for a significant region of the world. So that it is possible
for a System Administrator to choose ONE of them at install time of the
printer.

Applications that are "localized" are encouraged to be character set
independent. This is a fundamental strategy of the POSIX and OSF
localization efforts. The application passes the data to the platform
to display and the platform should have the same character set as the
SA set for the printer. Similarly on input from a user, the application
accepts the data in whatever coded character set the platform host
supplies it in and passes the text data on.

If the application actually has to process the information that it receives
from the MIB, such as looking for a match, the application needs to be
designed to have a message or token catalog, which contains the tokens
in each of the coded character sets that the application is designed to
handle. An alternative strategy is for the application to call a
platform code conversion library function (POSIX has such) which converts
from one coded character set to another. For example, the application
could convert data that it has to process from the MIB, such as match a value in
a list of values, from the coded character set of the MIB to, say, Unicode
(or UTF-8) and then perform the comparison. The the application only
really has to process a single coded character set (after doing the
conversion).

Tom