PMP Mail Archive: Re: PMP> URGENT: SYNTHESIS proposal on definition of OCTET

Re: PMP> URGENT: SYNTHESIS proposal on definition of OCTET

Tom Hastings (hastings@cp10.es.xerox.com)
Thu, 24 Jul 1997 16:30:41 PDT

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Tom Hastings: "PMP> Evidence of the confusion about what "ASCII" means"
Previous message: Tom Hastings: "PMP> Explanation of Alt 4: how ONE settable char set works for"

At 15:01 07/24/97 PDT, David_Kellerman@nls.com wrote:
>Tom, I think it takes real guts to say "the SYNTHESIS proposal is simple."

You are right. I over stated this. (Its simpler than what we
had been considering up to Nashua). You are correct that the
SYNTHESIS proposal is more complicated that the UTF-8 only proposal
for applications that want to support the world. On the other hand, the
SYNTHESIS proposal is simpler for simpler application that have a more
limited scope, such as working only with Latin1 or only with JIS X0208.

>
>It's a hack that attempts to sanction existing non-conforming
>implementations. (Come on, it's a convenient fig leaf to say that
>implementors didn't know what ASCII meant -- they knew; it was just
>expedient to allow the extended local character sets.) It imposes a
>continuing burden of multiple code sets on applications. And the
>introduction of an open-ended choice of code sets can only complicate
>interoperability.

True. But maybe we should let the market place decide, rather
than legislating current practice as non-conformant.

>
>Your proposal goes on for pages and pages of dense text. And every time
>you attempt to explain it to people, you end up with pages of
>explanation. This should be a clue that it's not simple.

I agree its not as simple as just forcing UTF-8.

However, it is simpler for an application that wants to do simple
things like work with the coded character set of its environment. The
application isn't forced to do UTF-8.

Forcing an application to do UTF-8 when its running on a Windows platform
with ISO 8859 and the Printer (could be) set to ISO 8859 is not simple.

By the way, ISO 10646 is 120 pages of non-Kanji characters. The Kanji
adds another 600 pages.

>
>The SYNTHESIS proposal is tricky. And I only started to appreciate some
>of the implications yesterday as I was putting together the Utf8String
>proposal. To give just one example, all the objects affected by the
>existing prtGeneralCurrentLocalization and prtConsoleLocalization are
>(with one carefully documented exception) read-only, as is the
>localization table (so the agent completely controls the localization).
>The SYNTHESIS proposal, in contrast, affects a mix of read-only and
>writable objects, and the character set selection may be writable. This
>breaks new ground for the Printer MIB. What are the implications for
>agent and application, and how many pages of explanation are required to
>cover them?

I agree that the read-write objects are tricky. However, implementors
do NOT need to implement the read-write objects as read-write. In fact,
we might want to warn them not to, unless there is some sort of security
mechanism in place to make sure that unauthorized users don't write
writeable objects.

>
>Now I'm not asking you for an explanation of the issue above. In fact,
>my point is really that an explanation isn't too useful. I didn't start
>to see how the machinery fit together until I started working with it,
>started trying to see the implications for an application
>implementation. In effect, getting my hands dirty.

Great. More of us need to do that.

>
>I think the issue needs this sort of hands-on consideration from others,
>particularly applications implementors concerned with interoperability,
>in order to build confidence that we understand the implications. The
>floods of "urgent, reply by yesterday" e-mail, by contrast, quickly
>start to blur into a muddle.

I agree we need to take the time with application implementors.

>
>:: David Kellerman Northlake Software 503-228-3383
>:: david_kellerman@nls.com Portland, Oregon fax 503-228-5662
>
>------------------------------------------------------------------------
>Date: Thu, 24 Jul 1997 11:34:50 PDT
>To: David_Kellerman@nls.com
>From: Tom Hastings <hastings@cp10.es.xerox.com>
>Subject: Re: PMP> URGENT: SYNTHESIS proposal on definition of OCTET STRING to
> allow superset of ASCII
>CC: pmp@pwg.org
>
>David,
>
>If this were the fall if 1994 when the PWG finished the Printer MIB
>and forwarded it to the IESG (and it got published in March 1995 as
>RFC 1759), I would be in favor of your proposal to use UTF-8 only.
>It is unambiguous and doesn't require a new object and covers the world.
>
>The Printer MIB was a "new protocol" at that time. Two and a half years
>later and with lots of vendors products in the market, the Printer MIB
>is no longer a "new protocol".
>
>However, even if the Printer MIB were a "new" protocol, the Asian vendors
>are split on using ISO 10646/Unicode/UTF-8 versus their long established
>national set (JIS X0208:1990 for Japanese) and GB2312:1980 for Chinese).
>So if there was real Asian representation in this discussion, it is not
>clear that they would favor UTF-8. (The SYNTHESIS proposal works with
>these Asian national sets, because code positions 32 to 127 are US-ASCII).
>
>Also RFC 2130 does state the case of existing protocols, such as HTTP
>which use ISO 8859 (Latin1). So our MIB is NOT being required to use
>UTF-8, since the Printer MIB is not a NEW protocol.
>
>My SYNTHESIS proposal allows using UTF-8 (and encourages it as the default),
>but does NOT require it. The simple scenario of how the new object
>prtGeneralStaticCodeSet is used (as a read-only object) is that the vendor
>ships a floppy with his printer. The System Administrator runs an install
>application that allows him to select which representation for the
>vendor supplied information to include and the install application puts that
>information into the flash memory of the printer. The System Administrator
>also decides at the same time which site-settable objects, such as
>prtGeneralPrinterName, prtGeneralCurrentOperator, prtGeneralServicePerson,
>etc. and sets that information also into flash memory of the printer.
>All these objects can be implemented as READ-ONLY in the MIB.
>
>Only if there is some sort of security mechansm in place should an implementor
>(or the system administrator) consider making these object READ-WRITE.
>
>
>The SYNTHESIS proposal is simple. The SA chooses one char set for all the
>information, whether it comes from the vendor or is site-dependent.
>Different printer implementations could support some or all of the following
>character sets:
>
> Market Coded Character Set
>
> US US-ASCII
>
> Western Hemisphere/ ISO 8859-1 (Latin1), HP Roman8, Code page 850
> Wester Europe
>
> World UTF-8, US-ASCII/JIS X0208, US-ASCII/GB2312
>
>Also the vendor might chose to only put English on his floppy, or could
>have different versions for each language on the floppy. But once in the
>MIB, there is only one coded charater set as selected by the System
>Administrator (hopefully in some user-friendly way, such as the SA
>choosing his environment, rather than choosing an actual coded character
>set).
>
>The point is that any one of the above character sets cover multiple
>languages for a significant region of the world. So that it is possible
>for a System Administrator to choose one of them at install time of the
>printer.
>
>Applications that are "localized" are encouraged to be character set
>independent. The application passes the data to the platform to display
>and the platform should have the same character set as the SA set for
>the printer.
>
>
>Tom
>
>
>
>At 16:46 07/23/97 PDT, David_Kellerman@nls.com wrote:
>>If there really is a broad interest in "fixing" the localization
>>problem, I would suggest an alternative to Tom's proposal -- switch from
>>ASCII to UTF-8 for OCTET STRING objects where representation of
>>multilingual text is appropriate.
>>
>>Summary of arguments in favor: no new objects, consistent with existing
>>conforming implementations (ASCII is subset of UTF-8), doesn't introduce
>>the complexity of multiple character sets for affected objects, doesn't
>>introduce the complexity of changeable character sets for affected
>>objects, seems to be consistent with direction of IETF generally and
>>SNMP in particular.
>>
>>Problems I see are, briefly: forces implementations to deal with UTF-8,
>>and it conflicts with existing implementations that allow non-ASCII
>>characters in the strings. How serious these are depends, in part, on
>>whether you believe other MIB work is going to force UTF-8 anyway, and
>>how much weight you want to give to existing practice that deviates from
>>the existing standard.
>>
>>Supporting material:
>> 1. See the note from Randy Presuhn that Chris forwarded to the mailing
>> list. He suggests this approach, has obviously given the topic a
>> lot of thought, and discusses it in some detail. He also asserts
>> that the SNMPv3 effort is headed toward use of UTF-8 for all
>> human-readable strings.
>> 2. I read Harald Alvestrand's message differently than Tom. I think it
>> says to specify the character set (a single one) and recommends
>> UTF-8; not to allow multiple character sets, chosen at the
>> discretion of the agent or application.
>> 3. I also read RFC 2130 (The Character Set Workshop Report) differently
>> than Tom. It covers a lot of ground, trying to address migration of
>> existing protocols as well as new work. For new protcols in
>> particular, it says in part:
>> New protocols do not suffer from the need to be compatible with
>> old 7-bit pipes. New protocol specifications SHOULD use ISO
>> 10646 as the base charset unless there is an overriding need to
>> use a different base character set.
>>
>>Here are the details of the changes to the document:
>>
>> 1. Copy the Utf8String TC from the sysAppl draft:
>>
>> Utf8String ::= TEXTUAL-CONVENTION
>> DISPLAY-HINT "255a"
>> STATUS current
>> DESCRIPTION
>> "To facilitate internationalization, this TC
>> represents information taken from the ISO/IEC IS
>> 10646-1 character set, encoded as an octet string
>> using the UTF-8 character encoding scheme described
>> in RFC 2044 [**]. For strings in 7-bit US-ASCII,
>> there is no impact since the UTF-8 representation
>> is identical to the US-ASCII encoding."
>> SYNTAX OCTET STRING (SIZE (0..255))
>>
>> Stylistically, you might want to introduce a ShortUtf8String with
>> SIZE (0..63) -- it would simplify many of the SYNTAX clauses (see
>> below).
>>
>> 2. Change the SYNTAX for the following objects from OCTET STRING:
>>
>> prtGeneralCurrentOperator Utf8String (SIZE(0..127))
>> prtGeneralServicePerson Utf8String (SIZE(0..127))
>> prtGeneralSerialNumber Utf8String
>> prtGeneralPrinterName Utf8String
>>
>> prtInputMediaName Utf8String (SIZE(0..63))
>> prtInputName Utf8String (SIZE(0..63))
>> prtInputVendorName Utf8String (SIZE(0..63))
>> prtInputModel Utf8String (SIZE(0..63))
>> prtInputVersion Utf8String (SIZE(0..63))
>> prtInputSerialNumber Utf8String (SIZE(0..32))
>>
>> prtInputMediaType Utf8String (SIZE(0..63))
>> prtInputMediaColor Utf8String (SIZE(0..63))
>>
>> prtOutputName Utf8String (SIZE(0..63))
>> prtOutputVendorName Utf8String (SIZE(0..63))
>> prtOutputModel Utf8String (SIZE(0..63))
>> prtOutputVersion Utf8String (SIZE(0..63))
>> prtOutputSerialNumber Utf8String (SIZE(0..63))
>>
>> prtMarkerColorantValue Utf8String
>>
>> prtChannelProtocolVersion Utf8String (SIZE(0..63))
>>
>> prtInterpreterLangLevel Utf8String (SIZE(0..31))
>> prtInterpreterLangVersion Utf8String (SIZE(0..31))
>> prtInterpreterVersion Utf8String (SIZE(0..31))
>>
>> 3. Add the reference to RFC 2044 to the bibliography:
>>
>> [**] F. Yergeau, "UTF-8, a transformation format of Unicode
>> and ISO 10646", RFC 2044, October 1996.
>>
>>That's it.
>>
>>:: David Kellerman Northlake Software 503-228-3383
>>:: david_kellerman@nls.com Portland, Oregon fax 503-228-5662
>>
>>
>
>
>

Next message: Tom Hastings: "PMP> Evidence of the confusion about what "ASCII" means"
Previous message: Tom Hastings: "PMP> Explanation of Alt 4: how ONE settable char set works for"