PMP> PROs and CONs of 5 alternatives to removing the code set

PMP> PROs and CONs of 5 alternatives to removing the code set

Tom Hastings hastings at cp10.es.xerox.com
Thu Jul 24 18:01:52 EDT 1997


Subj:  PROS and CONS for the 5 alternatives to removing the code set
ambiguity in the Printer MIB
From: Tom Hastings
Date: 7/24/97
File:  pmpcodes.doc


We are getting close to an agreement on removing the ambiguity
of the coded character set and the encoding scheme for the 25 Printer
MIB objects (see list at end) that are of syntax OCTET STRING and
for which the DESCRIPTION does not specify as being subject to localization
using prtGeneralCurrentLocalization or prtConsoleCurrentLocalization.


We are NOT attempting to solve the much harder problem of localization 
that includes language and country for these 25 objects.


Several of the objects SHALL always be in ASCII and English.  They have been
proposed to be changed from OCTET STRING to DisplayString (which is NVT
ASCII).  They are prtLocalizationLanguage and prtLocalizationCountry and
should be fixed length two octets: DisplayString(SIZE(2)).


NOTE:  In the following PRO and CON that have more than one statement each,
I have attempted to use the same letter for both sides of the argument.
Hence, the gap in the letters in some PRO and CON.


We have five alternatives proposed for the remaining 23 OCTET STRING objects
by clarifying and/or adding to the description on page 14 that currently
specifies that the OCTET STRING objects are "ASCII":






1. Leave "ASCII" undefined


   Leave the document as it is and leave "ASCII" as ambiguous.


   PRO:  Easiest for us to do.


   CON:  


   a. Doesn't follow the IETF procedures for going from proposed to draft
   where the document is to be clarifed based on implementation experience.


   b. Possible that our Area Director won't forward the document, since it is
   ambiguous about coded character set and so it would not become a draft 
   standard.


   c. The Area Director might fix the problem for us (but we might not like
   the results, if we don't participate).






2. Define "ASCII" to be 7-bit US-ASCII (ANSI X3.4-1968).


   Leave the document as it is, but clarify that "ASCII"
   means US-ASCII in 32-126 and that 128 to 255 SHALL NOT be used.  
   Indicate that code positions 0-32 and 127 SHALL NOT be used, unless
   the DESCRIPTION clause specifies otherwise.  
   Also add a proper reference to the ANSI X3.4:1968 that specifies ASCII.


   PRO:  Clarifies what the interpreation of these objects shall be with 
   respect to coded character set and encoding scheme.


   CON:  
   a. Would cause current significant implementations to become 
   non-conformant.  The HP 5si among others.


   b. Therefore, it would not be following the IETF procedures to be 
   clarifying the document based on implementation.


   c. Would not meet market objects of may of the implementations of the 
   Printer MIB to support other parts of the world where US-ASCII is not 
   sufficient to represent vendor-supplied and/or system administrator 
   supplied information.






3. US-ASCII in 32-126, other unspecified sets in 128 to 255.
   (My Tuesday proposal)


   Allow any graphic characters in 128 to 255, but 32-126 SHALL be US-ASCII
   but provide no way for an application to determine which character set
   128 to 255 is representing. 
   Indicate that code positions 0-32 and 127 SHALL NOT be used, unless
   the DESCRIPTION clause specifies otherwise.  
   Also add a proper reference to the ANSI X3.4:1968 that specifies ASCII.


   PRO:  


   a. Easy change to make.


   b. Conforms to current practice.


   CON:


   c. Does NOT permit an application to determine the coded character set
   so that the data coded in 128 to 255 becomes ambiguous to applications.


   d. Our Area Director has warned us that the protocol must specify the
   code set (either in the spec as a single set or in the protocol if the set
   varies).


   e. The Area Director is likely not to forward the document to the IESG
   so it couldn't become a draft standard (without more work).


   f. The Area Director might fix the problem for us (but we might not like
   the results, if we don't participate).






4. US-ASCII in 32-126, other specified sets in 128 to 255
  (My Wednesday SYNTHESIS proposal)


   Allow any graphic characters in 128 to 255, but 32-126 SHALL be US-ASCII
   AND provide a new object, prtGeneralStaticCodeSet, to specify what code set 
   is being used in 128 to 255 using the enum values registered with IANA.  
   See:  ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets.
   List conforming examples, which include UTF-8 (recommended default), 
   ISO 8859-1 (Latin1), HPRoman8, Code page 850, US-ASCII, US-ASCII/JIS X0208
   (Japanese national two byte set in 128-255), US-ASCII/GB2312 (PRC Chinese 
   national two-byte set in 128-255).
   Indicate that code positions 0-32 and 127 SHALL NOT be used, unless
   the DESCRIPTION clause specifies otherwise.  
   Also add a proper reference to the ANSI X3.4:1968 that specifies ASCII,
   plus all the examples.


PRO:
   a. Conforms to current practice of implementations of the Printer MIB.


   b. Permits UTF-8 to be used as recommended by the IAB for new protocols 
   in RFC 2130.


   c. Recommends UTF-8 as the default as recommended by the IAB in RFC 2130.


   d. Permits other coded character sets, as allowed by the IAB for existing
   protocols in RFC 2130.


   e. Allows Printer MIB implementations to use the coded character set that
   the customer's environment uses (US-ASCII, Unicode, ISO 8859-1, JIS X0208, 
   GB 2312, ...).


   f. Allows the vendor supplied and the system administrator supplied data to
   be represented in a SINGLE coded character set established at install time.
   (See separate e-mail on how this works in current vendor implementations).


CON:
   g. Harder for applications that want to process the values returned from 
   MIB (as opposed to simplying displaying data which is usually handled by
   the platform), if the data includes values in 128 to 255 and the 
   application needs to support more than one possible coded character set
   that the system administrator could have specified at install time.  For 
   example, if the application is supporting the Western Hemisphere and 
   Western Europe, the application might need to support, ISO Latin1, 
   HP Roman8 and Code page 850, depending on the customer's environment.
   Similar situation for Asia where the application might have to support
   Unicode/UTF-8, JIS X0208, and GB2312.


   e. If the coded character set specified for the MIB is different from
   that supported by the host platform in which the application is running, 
   the application will have to perform code conversion in order to display
   the coded character set data to the user.


   f. Complicates the system administrator install procedures, since the
   information on the install floppy needs to be represented in different
   coded character sets.






5. Only UTF-8 in 32 to 126 and 128 to 255
   (David Kellerman's proposal)


   Allow only UTF-8 which is US-ASCII in 32-126 and a multi-byte character
   encoding scheme in 128 to 255 that represent the characters of 
   ISO 10646 (Unicode).
   Indicate that code positions 0-32 and 127 SHALL NOT be used, unless
   the DESCRIPTION clause specifies otherwise.  
   Also add a proper reference to the ANSI X3.4:1968 that specifies ASCII
   and a reference to UTF-8.


   PRO:
   a. It is the recommendation of the IAB in RFC 2130 for "new [Internet] 
   protocols or new versions of old protocols" to use UTF-8 as the "default".


   c. Only a single coded character set is permitted, so that applications
   only have to deal with a single fixed coded character set at design time,
   namely UTF-8.


   e. ISO 10646 and UTF-8 are winning support in many quarters for actual 
   implementation.  NT has Unicode as its internal code set.  IPP specifies 
   all text attribute values in UTF-8.  Novell Netware 4.2 supports Unicode.




   CON:
   a. The Printer MIB is NOT a new protocol.  It is at least two years old
   so that the IAB recommendation for new protocols does not apply.
   But maybe going from proposed to draft constitutes a new version of an
   old protocol?
   The same paragraph of RFC 2130 (page 3) goes on to say: "These defaults
   do not deprecate the use of other character sets when and where they are
   needed; they are simply intended to provide guidance and a specification
   for interoperability.  
   In fact, RFC 2130 does not even mention SNMP as one of the Internet
   Protocols.  I wonder why?  Because SNMP is more likely to be deployed on
   a LAN, not the Internet?


   b. Current implementations, such as the HP 5si, use a conflicting coded 
   character set and so would be rendered non-conformant with this 
   alternative.  So we would not be following the IETF procedures of 
   clarifying the document with implementation experience when progressing 
   from proposed to draft status.


   c. Forces applications to deal with UTF-8, when some applications would be 
   far simpler to just use the coded characer set of the environment.


   d. Many applications do NOT actually need to process the information from 
   the MIB; they merely pass it through to the host platform, which takes care 
   of displaying the information.  Unless the platform supports UTF-8 (or 
   equivalently Unicode, such as NT or Novell 4.1), the applicataion will have
   to convert the coded character set data from UTF-8 to some other coded 
   character set that the host platform can display to the user.


   e. Accpeptance of ISO 10646 (Unicode) in Asian markets has not been 
   enthusiastic.  Many customers have huge investments in data and 
   applications that use their national two byte sets (JIS X0208 Japanese and 
   GB2312 Chinses).  So the Asian vendors have not jumped on the ISO 10646
   bandwagon.  Some have, some have not.  I don't have good figures on the
   the size of each camp.  I think it also depends on the application area.
   Code conversion between UTF-8 and JIB X0208 and GB2312 is a hugh 14-bit
   table lookup.




NOTE: both proposals 4 and 5 are upgrades from US-ASCII, so that PRO is not
mentioned with either alternative.




I've extracted all objects of type OCTET STRING from the MIB draft 02.
I've put 'localized' in front of the ones whose DESCRIPTIONs say are
localized according to prtGeneralCurrentLocalization and 'console
localization' in front of the ones whose DESCRIPTIONs say are localized by
prtConsoleLocalization:


I've put RW for read-write objects and R for read-only objects.
NOTE that an implementation is NOT required to make any of the
RW objects writeable. 


RW                prtGeneralCurrentOperator       OCTET STRING,
RW                prtGeneralServicePerson         OCTET STRING,
RW                prtGeneralSerialNumber          OCTET STRING, 


R   localized     prtCoverDescription      OCTET STRING,


The following two objects are proposed to be changed to DisplayString
since the ISO 639 and 3166standards specify what the values shall be using
Latin letters only:
R                 prtLocalizationLanguage       OCTET STRING,
R                 prtLocalizationCountry        OCTET STRING,


RW                prtInputMediaName                 OCTET STRING,
RW                prtInputName                      OCTET STRING,
R                 prtInputVendorName                OCTET STRING,
R                 prtInputModel                     OCTET STRING,
R                 prtInputVersion                   OCTET STRING,
R                 prtInputSerialNumber              OCTET STRING, 
R  localized      prtInputDescription               OCTET STRING,
RW                prtInputMediaType                 OCTET STRING,
RW                prtInputMediaColor                OCTET STRING,


RW                prtOutputName                     OCTET STRING,
R                 prtOutputVendorName               OCTET STRING,
R                 prtOutputModel                    OCTET STRING,
R                 prtOutputVersion                  OCTET STRING,
R                 prtOutputSerialNumber             OCTET STRING, 
R  localized      prtOutputDescription              OCTET STRING, 


R  localized      prtMarkerSuppliesDescription    OCTET STRING,
R                 prtMarkerColorantValue          OCTET STRING, 


R  localized      prtMediaPathDescription         OCTET STRING,


R                 prtChannelProtocolVersion           OCTET STRING,


R                 prtInterpreterLangLevel             OCTET STRING,
R                 prtInterpreterLangVersion           OCTET STRING, 
R   localized     prtInterpreterDescription           OCTET STRING,
R                 prtInterpreterVersion               OCTET STRING, 


RW  console localization   prtConsoleDisplayBufferText     OCTET STRING 
R   console localization   prtConsoleDescription           OCTET STRING 


R   localized              prtAlertDescription         OCTET STRING,




This proposal would add to the above list:


RW                         prtGeneralPrinterName           OCTET STRING


David Kellerman's revised prtChannelInformation also adds:


R                 prtChannelInformation               OCTET STRING



More information about the Pmp mailing list