PMP Mail Archive: PMP> PROs and CONs of 5 alternatives to removing the code set

PMP Mail Archive: PMP> PROs and CONs of 5 alternatives to removing the code set

PMP> PROs and CONs of 5 alternatives to removing the code set

Tom Hastings (hastings@cp10.es.xerox.com)
Thu, 24 Jul 1997 15:01:52 PDT

Subj: PROS and CONS for the 5 alternatives to removing the code set
ambiguity in the Printer MIB
From: Tom Hastings
Date: 7/24/97
File: pmpcodes.doc

We are getting close to an agreement on removing the ambiguity
of the coded character set and the encoding scheme for the 25 Printer
MIB objects (see list at end) that are of syntax OCTET STRING and
for which the DESCRIPTION does not specify as being subject to localization
using prtGeneralCurrentLocalization or prtConsoleCurrentLocalization.

We are NOT attempting to solve the much harder problem of localization
that includes language and country for these 25 objects.

Several of the objects SHALL always be in ASCII and English. They have been
proposed to be changed from OCTET STRING to DisplayString (which is NVT
ASCII). They are prtLocalizationLanguage and prtLocalizationCountry and
should be fixed length two octets: DisplayString(SIZE(2)).

NOTE: In the following PRO and CON that have more than one statement each,
I have attempted to use the same letter for both sides of the argument.
Hence, the gap in the letters in some PRO and CON.

We have five alternatives proposed for the remaining 23 OCTET STRING objects
by clarifying and/or adding to the description on page 14 that currently
specifies that the OCTET STRING objects are "ASCII":

1. Leave "ASCII" undefined

Leave the document as it is and leave "ASCII" as ambiguous.

PRO: Easiest for us to do.

CON:

a. Doesn't follow the IETF procedures for going from proposed to draft
where the document is to be clarifed based on implementation experience.

b. Possible that our Area Director won't forward the document, since it is
ambiguous about coded character set and so it would not become a draft
standard.

c. The Area Director might fix the problem for us (but we might not like
the results, if we don't participate).

2. Define "ASCII" to be 7-bit US-ASCII (ANSI X3.4-1968).

Leave the document as it is, but clarify that "ASCII"
means US-ASCII in 32-126 and that 128 to 255 SHALL NOT be used.
Indicate that code positions 0-32 and 127 SHALL NOT be used, unless
the DESCRIPTION clause specifies otherwise.
Also add a proper reference to the ANSI X3.4:1968 that specifies ASCII.

PRO: Clarifies what the interpreation of these objects shall be with
respect to coded character set and encoding scheme.

CON:
a. Would cause current significant implementations to become
non-conformant. The HP 5si among others.

b. Therefore, it would not be following the IETF procedures to be
clarifying the document based on implementation.

c. Would not meet market objects of may of the implementations of the
Printer MIB to support other parts of the world where US-ASCII is not
sufficient to represent vendor-supplied and/or system administrator
supplied information.

3. US-ASCII in 32-126, other unspecified sets in 128 to 255.
(My Tuesday proposal)

Allow any graphic characters in 128 to 255, but 32-126 SHALL be US-ASCII
but provide no way for an application to determine which character set
128 to 255 is representing.
Indicate that code positions 0-32 and 127 SHALL NOT be used, unless
the DESCRIPTION clause specifies otherwise.
Also add a proper reference to the ANSI X3.4:1968 that specifies ASCII.

PRO:

a. Easy change to make.

b. Conforms to current practice.

CON:

c. Does NOT permit an application to determine the coded character set
so that the data coded in 128 to 255 becomes ambiguous to applications.

d. Our Area Director has warned us that the protocol must specify the
code set (either in the spec as a single set or in the protocol if the set
varies).

e. The Area Director is likely not to forward the document to the IESG
so it couldn't become a draft standard (without more work).

f. The Area Director might fix the problem for us (but we might not like
the results, if we don't participate).

4. US-ASCII in 32-126, other specified sets in 128 to 255
(My Wednesday SYNTHESIS proposal)

Allow any graphic characters in 128 to 255, but 32-126 SHALL be US-ASCII
AND provide a new object, prtGeneralStaticCodeSet, to specify what code set
is being used in 128 to 255 using the enum values registered with IANA.
See: ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets.
List conforming examples, which include UTF-8 (recommended default),
ISO 8859-1 (Latin1), HPRoman8, Code page 850, US-ASCII, US-ASCII/JIS X0208
(Japanese national two byte set in 128-255), US-ASCII/GB2312 (PRC Chinese
national two-byte set in 128-255).
Indicate that code positions 0-32 and 127 SHALL NOT be used, unless
the DESCRIPTION clause specifies otherwise.
Also add a proper reference to the ANSI X3.4:1968 that specifies ASCII,
plus all the examples.

PRO:
a. Conforms to current practice of implementations of the Printer MIB.

b. Permits UTF-8 to be used as recommended by the IAB for new protocols
in RFC 2130.

c. Recommends UTF-8 as the default as recommended by the IAB in RFC 2130.

d. Permits other coded character sets, as allowed by the IAB for existing
protocols in RFC 2130.

e. Allows Printer MIB implementations to use the coded character set that
the customer's environment uses (US-ASCII, Unicode, ISO 8859-1, JIS X0208,
GB 2312, ...).

f. Allows the vendor supplied and the system administrator supplied data to
be represented in a SINGLE coded character set established at install time.
(See separate e-mail on how this works in current vendor implementations).

CON:
g. Harder for applications that want to process the values returned from
MIB (as opposed to simplying displaying data which is usually handled by
the platform), if the data includes values in 128 to 255 and the
application needs to support more than one possible coded character set
that the system administrator could have specified at install time. For
example, if the application is supporting the Western Hemisphere and
Western Europe, the application might need to support, ISO Latin1,
HP Roman8 and Code page 850, depending on the customer's environment.
Similar situation for Asia where the application might have to support
Unicode/UTF-8, JIS X0208, and GB2312.

e. If the coded character set specified for the MIB is different from
that supported by the host platform in which the application is running,
the application will have to perform code conversion in order to display
the coded character set data to the user.

f. Complicates the system administrator install procedures, since the
information on the install floppy needs to be represented in different
coded character sets.

5. Only UTF-8 in 32 to 126 and 128 to 255
(David Kellerman's proposal)

Allow only UTF-8 which is US-ASCII in 32-126 and a multi-byte character
encoding scheme in 128 to 255 that represent the characters of
ISO 10646 (Unicode).
Indicate that code positions 0-32 and 127 SHALL NOT be used, unless
the DESCRIPTION clause specifies otherwise.
Also add a proper reference to the ANSI X3.4:1968 that specifies ASCII
and a reference to UTF-8.

PRO:
a. It is the recommendation of the IAB in RFC 2130 for "new [Internet]
protocols or new versions of old protocols" to use UTF-8 as the "default".

c. Only a single coded character set is permitted, so that applications
only have to deal with a single fixed coded character set at design time,
namely UTF-8.

e. ISO 10646 and UTF-8 are winning support in many quarters for actual
implementation. NT has Unicode as its internal code set. IPP specifies
all text attribute values in UTF-8. Novell Netware 4.2 supports Unicode.

CON:
a. The Printer MIB is NOT a new protocol. It is at least two years old
so that the IAB recommendation for new protocols does not apply.
But maybe going from proposed to draft constitutes a new version of an
old protocol?
The same paragraph of RFC 2130 (page 3) goes on to say: "These defaults
do not deprecate the use of other character sets when and where they are
needed; they are simply intended to provide guidance and a specification
for interoperability.
In fact, RFC 2130 does not even mention SNMP as one of the Internet
Protocols. I wonder why? Because SNMP is more likely to be deployed on
a LAN, not the Internet?

b. Current implementations, such as the HP 5si, use a conflicting coded
character set and so would be rendered non-conformant with this
alternative. So we would not be following the IETF procedures of
clarifying the document with implementation experience when progressing
from proposed to draft status.

c. Forces applications to deal with UTF-8, when some applications would be
far simpler to just use the coded characer set of the environment.

d. Many applications do NOT actually need to process the information from
the MIB; they merely pass it through to the host platform, which takes care
of displaying the information. Unless the platform supports UTF-8 (or
equivalently Unicode, such as NT or Novell 4.1), the applicataion will have
to convert the coded character set data from UTF-8 to some other coded
character set that the host platform can display to the user.

e. Accpeptance of ISO 10646 (Unicode) in Asian markets has not been
enthusiastic. Many customers have huge investments in data and
applications that use their national two byte sets (JIS X0208 Japanese and
GB2312 Chinses). So the Asian vendors have not jumped on the ISO 10646
bandwagon. Some have, some have not. I don't have good figures on the
the size of each camp. I think it also depends on the application area.
Code conversion between UTF-8 and JIB X0208 and GB2312 is a hugh 14-bit
table lookup.

NOTE: both proposals 4 and 5 are upgrades from US-ASCII, so that PRO is not
mentioned with either alternative.

I've extracted all objects of type OCTET STRING from the MIB draft 02.
I've put 'localized' in front of the ones whose DESCRIPTIONs say are
localized according to prtGeneralCurrentLocalization and 'console
localization' in front of the ones whose DESCRIPTIONs say are localized by
prtConsoleLocalization:

I've put RW for read-write objects and R for read-only objects.
NOTE that an implementation is NOT required to make any of the
RW objects writeable.

RW prtGeneralCurrentOperator OCTET STRING,
RW prtGeneralServicePerson OCTET STRING,
RW prtGeneralSerialNumber OCTET STRING,

R localized prtCoverDescription OCTET STRING,

The following two objects are proposed to be changed to DisplayString
since the ISO 639 and 3166standards specify what the values shall be using
Latin letters only:
R prtLocalizationLanguage OCTET STRING,
R prtLocalizationCountry OCTET STRING,

RW prtInputMediaName OCTET STRING,
RW prtInputName OCTET STRING,
R prtInputVendorName OCTET STRING,
R prtInputModel OCTET STRING,
R prtInputVersion OCTET STRING,
R prtInputSerialNumber OCTET STRING,
R localized prtInputDescription OCTET STRING,
RW prtInputMediaType OCTET STRING,
RW prtInputMediaColor OCTET STRING,

RW prtOutputName OCTET STRING,
R prtOutputVendorName OCTET STRING,
R prtOutputModel OCTET STRING,
R prtOutputVersion OCTET STRING,
R prtOutputSerialNumber OCTET STRING,
R localized prtOutputDescription OCTET STRING,

R localized prtMarkerSuppliesDescription OCTET STRING,
R prtMarkerColorantValue OCTET STRING,

R localized prtMediaPathDescription OCTET STRING,

R prtChannelProtocolVersion OCTET STRING,

R prtInterpreterLangLevel OCTET STRING,
R prtInterpreterLangVersion OCTET STRING,
R localized prtInterpreterDescription OCTET STRING,
R prtInterpreterVersion OCTET STRING,

RW console localization prtConsoleDisplayBufferText OCTET STRING
R console localization prtConsoleDescription OCTET STRING

R localized prtAlertDescription OCTET STRING,

This proposal would add to the above list:

RW prtGeneralPrinterName OCTET STRING

David Kellerman's revised prtChannelInformation also adds:

R prtChannelInformation OCTET STRING