PMP> URGENT: SYNTHESIS proposal on definition of OCTET

Thu Jul 24 19:30:41 EDT 1997

At 15:01 07/24/97 PDT, David_Kellerman at nls.com wrote:
>Tom, I think it takes real guts to say "the SYNTHESIS proposal is simple."

You are right.  I over stated this.  (Its simpler than what we
had been considering up to Nashua).  You are correct that the
SYNTHESIS proposal is more complicated that the UTF-8 only proposal
for applications that want to support the world.  On the other hand, the 
SYNTHESIS proposal is simpler for simpler application that have a more 
limited scope, such as working only with Latin1 or only with JIS X0208.

>
>It's a hack that attempts to sanction existing non-conforming
>implementations. (Come on, it's a convenient fig leaf to say that
>implementors didn't know what ASCII meant -- they knew; it was just
>expedient to allow the extended local character sets.)  It imposes a
>continuing burden of multiple code sets on applications.  And the
>introduction of an open-ended choice of code sets can only complicate
>interoperability. 

True.  But maybe we should let the market place decide, rather
than legislating current practice as non-conformant.

>
>Your proposal goes on for pages and pages of dense text.  And every time
>you attempt to explain it to people, you end up with pages of
>explanation.  This should be a clue that it's not simple. 

I agree its not as simple as just forcing UTF-8.

However, it is simpler for an application that wants to do simple
things like work with the coded character set of its environment.  The
application isn't forced to do UTF-8.

Forcing an application to do UTF-8 when its running on a Windows platform
with ISO 8859 and the Printer (could be) set to ISO 8859 is not simple.

By the way, ISO 10646 is 120 pages of non-Kanji characters.  The Kanji
adds another 600 pages.

>
>The SYNTHESIS proposal is tricky.  And I only started to appreciate some
>of the implications yesterday as I was putting together the Utf8String
>proposal.  To give just one example, all the objects affected by the
>existing prtGeneralCurrentLocalization and prtConsoleLocalization are
>(with one carefully documented exception) read-only, as is the
>localization table (so the agent completely controls the localization). 
>The SYNTHESIS proposal, in contrast, affects a mix of read-only and
>writable objects, and the character set selection may be writable.  This
>breaks new ground for the Printer MIB.  What are the implications for
>agent and application, and how many pages of explanation are required to
>cover them? 

I agree that the read-write objects are tricky.  However, implementors
do NOT need to implement the read-write objects as read-write.  In fact,
we might want to warn them not to, unless there is some sort of security
mechanism in place to make sure that unauthorized users don't write
writeable objects.

>
>Now I'm not asking you for an explanation of the issue above.  In fact,
>my point is really that an explanation isn't too useful.  I didn't start
>to see how the machinery fit together until I started working with it,
>started trying to see the implications for an application
>implementation.  In effect, getting my hands dirty. 

Great.  More of us need to do that.

>
>I think the issue needs this sort of hands-on consideration from others,
>particularly applications implementors concerned with interoperability,
>in order to build confidence that we understand the implications.  The
>floods of "urgent, reply by yesterday" e-mail, by contrast, quickly
>start to blur into a muddle. 

I agree we need to take the time with application implementors.

>
>::  David Kellerman         Northlake Software      503-228-3383
>::  david_kellerman at nls.com Portland, Oregon        fax 503-228-5662
>
>------------------------------------------------------------------------
>Date: Thu, 24 Jul 1997 11:34:50 PDT
>To: David_Kellerman at nls.com
>From: Tom Hastings <hastings at cp10.es.xerox.com>
>Subject: Re: PMP> URGENT: SYNTHESIS proposal on definition of OCTET STRING to
>         allow superset of ASCII
>CC: pmp at pwg.org
>
>David,
>
>If this were the fall if 1994 when the PWG finished the Printer MIB
>and forwarded it to the IESG (and it got published in March 1995 as
>RFC 1759), I would be in favor of your proposal to use UTF-8 only.  
>It is unambiguous and doesn't require a new object and covers the world.
>
>The Printer MIB was a "new protocol" at that time.  Two and a half years
>later and with lots of vendors products in the market, the Printer MIB
>is no longer a "new protocol".
>
>However, even if the Printer MIB were a "new" protocol, the Asian vendors
>are split on using ISO 10646/Unicode/UTF-8 versus their long established 
>national set (JIS X0208:1990 for Japanese) and GB2312:1980 for Chinese).  
>So if there was real Asian representation in this discussion, it is not 
>clear that they would favor UTF-8.  (The SYNTHESIS proposal works with
>these Asian national sets, because code positions 32 to 127 are US-ASCII).
>
>Also RFC 2130 does state the case of existing protocols, such as HTTP
>which use ISO 8859 (Latin1).  So our MIB is NOT being required to use
>UTF-8, since the Printer MIB is not a NEW protocol.
>
>My SYNTHESIS proposal allows using UTF-8 (and encourages it as the default),
>but does NOT require it.  The simple scenario of how the new object
>prtGeneralStaticCodeSet is used (as a read-only object) is that the vendor
>ships a floppy with his printer.  The System Administrator runs an install
>application that allows him to select which representation for the
>vendor supplied information to include and the install application puts that
>information into the flash memory of the printer.  The System Administrator
>also decides at the same time which site-settable objects, such as
>prtGeneralPrinterName, prtGeneralCurrentOperator, prtGeneralServicePerson,
>etc. and sets that information also into flash memory of the printer.
>All these objects can be implemented as READ-ONLY in the MIB.
>
>Only if there is some sort of security mechansm in place should an implementor
>(or the system administrator) consider making these object READ-WRITE.
>
>
>The SYNTHESIS proposal is simple.  The SA chooses one char set for all the 
>information, whether it comes from the vendor or is site-dependent.  
>Different printer implementations could support some or all of the following 
>character sets:
>
>  Market                 Coded Character Set
> 
>  US                     US-ASCII
>
>  Western Hemisphere/    ISO 8859-1 (Latin1), HP Roman8, Code page 850
>  Wester Europe
>
>  World                  UTF-8, US-ASCII/JIS X0208, US-ASCII/GB2312
>
>Also the vendor might chose to only put English on his floppy, or could
>have different versions for each language on the floppy.  But once in the
>MIB, there is only one coded charater set as selected by the System
>Administrator (hopefully in some user-friendly way, such as the SA
>choosing his environment, rather than choosing an actual coded character
>set).
>
>The point is that any one of the above character sets cover multiple 
>languages for a significant region of the world.  So that it is possible
>for a System Administrator to choose one of them at install time of the
>printer.
>
>Applications that are "localized" are encouraged to be character set
>independent.  The application passes the data to the platform to display
>and the platform should have the same character set as the SA set for
>the printer.
>
>
>Tom
>
>
>
>At 16:46 07/23/97 PDT, David_Kellerman at nls.com wrote:
>>If there really is a broad interest in "fixing" the localization
>>problem, I would suggest an alternative to Tom's proposal -- switch from
>>ASCII to UTF-8 for OCTET STRING objects where representation of
>>multilingual text is appropriate. 
>>
>>Summary of arguments in favor: no new objects, consistent with existing
>>conforming implementations (ASCII is subset of UTF-8), doesn't introduce
>>the complexity of multiple character sets for affected objects, doesn't
>>introduce the complexity of changeable character sets for affected
>>objects, seems to be consistent with direction of IETF generally and
>>SNMP in particular. 
>>
>>Problems I see are, briefly: forces implementations to deal with UTF-8,
>>and it conflicts with existing implementations that allow non-ASCII
>>characters in the strings.  How serious these are depends, in part, on
>>whether you believe other MIB work is going to force UTF-8 anyway, and
>>how much weight you want to give to existing practice that deviates from
>>the existing standard. 
>>
>>Supporting material: 
>> 1. See the note from Randy Presuhn that Chris forwarded to the mailing
>>    list.  He suggests this approach, has obviously given the topic a
>>    lot of thought, and discusses it in some detail.  He also asserts
>>    that the SNMPv3 effort is headed toward use of UTF-8 for all
>>    human-readable strings. 
>> 2. I read Harald Alvestrand's message differently than Tom.  I think it
>>    says to specify the character set (a single one) and recommends
>>    UTF-8; not to allow multiple character sets, chosen at the
>>    discretion of the agent or application. 
>> 3. I also read RFC 2130 (The Character Set Workshop Report) differently
>>    than Tom.  It covers a lot of ground, trying to address migration of
>>    existing protocols as well as new work.  For new protcols in
>>    particular, it says in part: 
>>        New protocols do not suffer from the need to be compatible with
>>        old 7-bit pipes.  New protocol specifications SHOULD use ISO
>>        10646 as the base charset unless there is an overriding need to
>>        use a different base character set. 
>>
>>Here are the details of the changes to the document:
>>
>> 1. Copy the Utf8String TC from the sysAppl draft:
>>
>>    Utf8String ::= TEXTUAL-CONVENTION
>>         DISPLAY-HINT "255a"
>>         STATUS  current
>>         DESCRIPTION
>>                 "To facilitate internationalization, this TC
>>                  represents information taken from the ISO/IEC IS
>>                  10646-1 character set, encoded as an octet string
>>                  using the UTF-8 character encoding scheme described
>>                  in RFC 2044 [**].  For strings in 7-bit US-ASCII,
>>                  there is no impact since the UTF-8 representation
>>                  is identical to the US-ASCII encoding."
>>         SYNTAX  OCTET STRING (SIZE (0..255))
>>
>>    Stylistically, you might want to introduce a ShortUtf8String with
>>    SIZE (0..63) -- it would simplify many of the SYNTAX clauses (see
>>    below). 
>>
>> 2. Change the SYNTAX for the following objects from OCTET STRING:
>>
>>    prtGeneralCurrentOperator   Utf8String (SIZE(0..127))
>>    prtGeneralServicePerson     Utf8String (SIZE(0..127))
>>    prtGeneralSerialNumber      Utf8String
>>    prtGeneralPrinterName       Utf8String
>>
>>    prtInputMediaName           Utf8String (SIZE(0..63))
>>    prtInputName                Utf8String (SIZE(0..63))
>>    prtInputVendorName          Utf8String (SIZE(0..63))
>>    prtInputModel               Utf8String (SIZE(0..63))
>>    prtInputVersion             Utf8String (SIZE(0..63))
>>    prtInputSerialNumber        Utf8String (SIZE(0..32))
>>
>>    prtInputMediaType           Utf8String (SIZE(0..63))
>>    prtInputMediaColor          Utf8String (SIZE(0..63))
>>
>>    prtOutputName               Utf8String (SIZE(0..63))
>>    prtOutputVendorName         Utf8String (SIZE(0..63))
>>    prtOutputModel              Utf8String (SIZE(0..63))
>>    prtOutputVersion            Utf8String (SIZE(0..63))
>>    prtOutputSerialNumber       Utf8String (SIZE(0..63))
>>
>>    prtMarkerColorantValue      Utf8String
>>
>>    prtChannelProtocolVersion   Utf8String (SIZE(0..63))
>>
>>    prtInterpreterLangLevel     Utf8String (SIZE(0..31))
>>    prtInterpreterLangVersion   Utf8String (SIZE(0..31))
>>    prtInterpreterVersion       Utf8String (SIZE(0..31))
>>
>> 3. Add the reference to RFC 2044 to the bibliography: 
>>
>>    [**] F. Yergeau, "UTF-8, a transformation format of Unicode
>>         and ISO 10646", RFC 2044, October 1996.
>>
>>That's it. 
>>
>>::  David Kellerman         Northlake Software      503-228-3383
>>::  david_kellerman at nls.com Portland, Oregon        fax 503-228-5662
>>
>>
>
>
>