PMP> URGENT: SYNTHESIS proposal on definition of OCTET STRING to

Thu Jul 24 18:01:05 EDT 1997

Tom, I think it takes real guts to say "the SYNTHESIS proposal is simple."

It's a hack that attempts to sanction existing non-conforming
implementations. (Come on, it's a convenient fig leaf to say that
implementors didn't know what ASCII meant -- they knew; it was just
expedient to allow the extended local character sets.)  It imposes a
continuing burden of multiple code sets on applications.  And the
introduction of an open-ended choice of code sets can only complicate
interoperability. 

Your proposal goes on for pages and pages of dense text.  And every time
you attempt to explain it to people, you end up with pages of
explanation.  This should be a clue that it's not simple. 

The SYNTHESIS proposal is tricky.  And I only started to appreciate some
of the implications yesterday as I was putting together the Utf8String
proposal.  To give just one example, all the objects affected by the
existing prtGeneralCurrentLocalization and prtConsoleLocalization are
(with one carefully documented exception) read-only, as is the
localization table (so the agent completely controls the localization). 
The SYNTHESIS proposal, in contrast, affects a mix of read-only and
writable objects, and the character set selection may be writable.  This
breaks new ground for the Printer MIB.  What are the implications for
agent and application, and how many pages of explanation are required to
cover them? 

Now I'm not asking you for an explanation of the issue above.  In fact,
my point is really that an explanation isn't too useful.  I didn't start
to see how the machinery fit together until I started working with it,
started trying to see the implications for an application
implementation.  In effect, getting my hands dirty. 

I think the issue needs this sort of hands-on consideration from others,
particularly applications implementors concerned with interoperability,
in order to build confidence that we understand the implications.  The
floods of "urgent, reply by yesterday" e-mail, by contrast, quickly
start to blur into a muddle. 

::  David Kellerman         Northlake Software      503-228-3383
::  david_kellerman at nls.com Portland, Oregon        fax 503-228-5662

------------------------------------------------------------------------
Date: Thu, 24 Jul 1997 11:34:50 PDT
To: David_Kellerman at nls.com
From: Tom Hastings <hastings at cp10.es.xerox.com>
Subject: Re: PMP> URGENT: SYNTHESIS proposal on definition of OCTET STRING to
         allow superset of ASCII
CC: pmp at pwg.org

David,

If this were the fall if 1994 when the PWG finished the Printer MIB
and forwarded it to the IESG (and it got published in March 1995 as
RFC 1759), I would be in favor of your proposal to use UTF-8 only.  
It is unambiguous and doesn't require a new object and covers the world.

The Printer MIB was a "new protocol" at that time.  Two and a half years
later and with lots of vendors products in the market, the Printer MIB
is no longer a "new protocol".

However, even if the Printer MIB were a "new" protocol, the Asian vendors
are split on using ISO 10646/Unicode/UTF-8 versus their long established 
national set (JIS X0208:1990 for Japanese) and GB2312:1980 for Chinese).  
So if there was real Asian representation in this discussion, it is not 
clear that they would favor UTF-8.  (The SYNTHESIS proposal works with
these Asian national sets, because code positions 32 to 127 are US-ASCII).

Also RFC 2130 does state the case of existing protocols, such as HTTP
which use ISO 8859 (Latin1).  So our MIB is NOT being required to use
UTF-8, since the Printer MIB is not a NEW protocol.

My SYNTHESIS proposal allows using UTF-8 (and encourages it as the default),
but does NOT require it.  The simple scenario of how the new object
prtGeneralStaticCodeSet is used (as a read-only object) is that the vendor
ships a floppy with his printer.  The System Administrator runs an install
application that allows him to select which representation for the
vendor supplied information to include and the install application puts that
information into the flash memory of the printer.  The System Administrator
also decides at the same time which site-settable objects, such as
prtGeneralPrinterName, prtGeneralCurrentOperator, prtGeneralServicePerson,
etc. and sets that information also into flash memory of the printer.
All these objects can be implemented as READ-ONLY in the MIB.

Only if there is some sort of security mechansm in place should an implementor
(or the system administrator) consider making these object READ-WRITE.

The SYNTHESIS proposal is simple.  The SA chooses one char set for all the 
information, whether it comes from the vendor or is site-dependent.  
Different printer implementations could support some or all of the following 
character sets:

  Market                 Coded Character Set

  US                     US-ASCII

  Western Hemisphere/    ISO 8859-1 (Latin1), HP Roman8, Code page 850
  Wester Europe

  World                  UTF-8, US-ASCII/JIS X0208, US-ASCII/GB2312

Also the vendor might chose to only put English on his floppy, or could
have different versions for each language on the floppy.  But once in the
MIB, there is only one coded charater set as selected by the System
Administrator (hopefully in some user-friendly way, such as the SA
choosing his environment, rather than choosing an actual coded character
set).

The point is that any one of the above character sets cover multiple 
languages for a significant region of the world.  So that it is possible
for a System Administrator to choose one of them at install time of the
printer.

Applications that are "localized" are encouraged to be character set
independent.  The application passes the data to the platform to display
and the platform should have the same character set as the SA set for
the printer.

Tom

At 16:46 07/23/97 PDT, David_Kellerman at nls.com wrote:
>If there really is a broad interest in "fixing" the localization
>problem, I would suggest an alternative to Tom's proposal -- switch from
>ASCII to UTF-8 for OCTET STRING objects where representation of
>multilingual text is appropriate. 
>
>Summary of arguments in favor: no new objects, consistent with existing
>conforming implementations (ASCII is subset of UTF-8), doesn't introduce
>the complexity of multiple character sets for affected objects, doesn't
>introduce the complexity of changeable character sets for affected
>objects, seems to be consistent with direction of IETF generally and
>SNMP in particular. 
>
>Problems I see are, briefly: forces implementations to deal with UTF-8,
>and it conflicts with existing implementations that allow non-ASCII
>characters in the strings.  How serious these are depends, in part, on
>whether you believe other MIB work is going to force UTF-8 anyway, and
>how much weight you want to give to existing practice that deviates from
>the existing standard. 
>
>Supporting material: 
> 1. See the note from Randy Presuhn that Chris forwarded to the mailing
>    list.  He suggests this approach, has obviously given the topic a
>    lot of thought, and discusses it in some detail.  He also asserts
>    that the SNMPv3 effort is headed toward use of UTF-8 for all
>    human-readable strings. 
> 2. I read Harald Alvestrand's message differently than Tom.  I think it
>    says to specify the character set (a single one) and recommends
>    UTF-8; not to allow multiple character sets, chosen at the
>    discretion of the agent or application. 
> 3. I also read RFC 2130 (The Character Set Workshop Report) differently
>    than Tom.  It covers a lot of ground, trying to address migration of
>    existing protocols as well as new work.  For new protcols in
>    particular, it says in part: 
>        New protocols do not suffer from the need to be compatible with
>        old 7-bit pipes.  New protocol specifications SHOULD use ISO
>        10646 as the base charset unless there is an overriding need to
>        use a different base character set. 
>
>Here are the details of the changes to the document:
>
> 1. Copy the Utf8String TC from the sysAppl draft:
>
>    Utf8String ::= TEXTUAL-CONVENTION
>         DISPLAY-HINT "255a"
>         STATUS  current
>         DESCRIPTION
>                 "To facilitate internationalization, this TC
>                  represents information taken from the ISO/IEC IS
>                  10646-1 character set, encoded as an octet string
>                  using the UTF-8 character encoding scheme described
>                  in RFC 2044 [**].  For strings in 7-bit US-ASCII,
>                  there is no impact since the UTF-8 representation
>                  is identical to the US-ASCII encoding."
>         SYNTAX  OCTET STRING (SIZE (0..255))
>
>    Stylistically, you might want to introduce a ShortUtf8String with
>    SIZE (0..63) -- it would simplify many of the SYNTAX clauses (see
>    below). 
>
> 2. Change the SYNTAX for the following objects from OCTET STRING:
>
>    prtGeneralCurrentOperator   Utf8String (SIZE(0..127))
>    prtGeneralServicePerson     Utf8String (SIZE(0..127))
>    prtGeneralSerialNumber      Utf8String
>    prtGeneralPrinterName       Utf8String
>
>    prtInputMediaName           Utf8String (SIZE(0..63))
>    prtInputName                Utf8String (SIZE(0..63))
>    prtInputVendorName          Utf8String (SIZE(0..63))
>    prtInputModel               Utf8String (SIZE(0..63))
>    prtInputVersion             Utf8String (SIZE(0..63))
>    prtInputSerialNumber        Utf8String (SIZE(0..32))
>
>    prtInputMediaType           Utf8String (SIZE(0..63))
>    prtInputMediaColor          Utf8String (SIZE(0..63))
>
>    prtOutputName               Utf8String (SIZE(0..63))
>    prtOutputVendorName         Utf8String (SIZE(0..63))
>    prtOutputModel              Utf8String (SIZE(0..63))
>    prtOutputVersion            Utf8String (SIZE(0..63))
>    prtOutputSerialNumber       Utf8String (SIZE(0..63))
>
>    prtMarkerColorantValue      Utf8String
>
>    prtChannelProtocolVersion   Utf8String (SIZE(0..63))
>
>    prtInterpreterLangLevel     Utf8String (SIZE(0..31))
>    prtInterpreterLangVersion   Utf8String (SIZE(0..31))
>    prtInterpreterVersion       Utf8String (SIZE(0..31))
>
> 3. Add the reference to RFC 2044 to the bibliography: 
>
>    [**] F. Yergeau, "UTF-8, a transformation format of Unicode
>         and ISO 10646", RFC 2044, October 1996.
>
>That's it. 
>
>::  David Kellerman         Northlake Software      503-228-3383
>::  david_kellerman at nls.com Portland, Oregon        fax 503-228-5662
>
>