PMP Mail Archive: Re: PMP> URGENT: SYNTHESIS proposal on definition of OCTET

Re: PMP> URGENT: SYNTHESIS proposal on definition of OCTET

Tom Hastings (hastings@cp10.es.xerox.com)
Thu, 24 Jul 1997 11:34:50 PDT

David,

If this were the fall if 1994 when the PWG finished the Printer MIB
and forwarded it to the IESG (and it got published in March 1995 as
RFC 1759), I would be in favor of your proposal to use UTF-8 only.
It is unambiguous and doesn't require a new object and covers the world.

The Printer MIB was a "new protocol" at that time. Two and a half years
later and with lots of vendors products in the market, the Printer MIB
is no longer a "new protocol".

However, even if the Printer MIB were a "new" protocol, the Asian vendors
are split on using ISO 10646/Unicode/UTF-8 versus their long established
national set (JIS X0208:1990 for Japanese) and GB2312:1980 for Chinese).
So if there was real Asian representation in this discussion, it is not
clear that they would favor UTF-8. (The SYNTHESIS proposal works with
these Asian national sets, because code positions 32 to 127 are US-ASCII).

Also RFC 2130 does state the case of existing protocols, such as HTTP
which use ISO 8859 (Latin1). So our MIB is NOT being required to use
UTF-8, since the Printer MIB is not a NEW protocol.

My SYNTHESIS proposal allows using UTF-8 (and encourages it as the default),
but does NOT require it. The simple scenario of how the new object
prtGeneralStaticCodeSet is used (as a read-only object) is that the vendor
ships a floppy with his printer. The System Administrator runs an install
application that allows him to select which representation for the
vendor supplied information to include and the install application puts that
information into the flash memory of the printer. The System Administrator
also decides at the same time which site-settable objects, such as
prtGeneralPrinterName, prtGeneralCurrentOperator, prtGeneralServicePerson,
etc. and sets that information also into flash memory of the printer.
All these objects can be implemented as READ-ONLY in the MIB.

Only if there is some sort of security mechansm in place should an implementor
(or the system administrator) consider making these object READ-WRITE.

The SYNTHESIS proposal is simple. The SA chooses one char set for all the
information, whether it comes from the vendor or is site-dependent.
Different printer implementations could support some or all of the following
character sets:

Market Coded Character Set

US US-ASCII

Western Hemisphere/ ISO 8859-1 (Latin1), HP Roman8, Code page 850
Wester Europe

World UTF-8, US-ASCII/JIS X0208, US-ASCII/GB2312

Also the vendor might chose to only put English on his floppy, or could
have different versions for each language on the floppy. But once in the
MIB, there is only one coded charater set as selected by the System
Administrator (hopefully in some user-friendly way, such as the SA
choosing his environment, rather than choosing an actual coded character
set).

The point is that any one of the above character sets cover multiple
languages for a significant region of the world. So that it is possible
for a System Administrator to choose one of them at install time of the
printer.

Applications that are "localized" are encouraged to be character set
independent. The application passes the data to the platform to display
and the platform should have the same character set as the SA set for
the printer.

Tom

At 16:46 07/23/97 PDT, David_Kellerman@nls.com wrote:
>If there really is a broad interest in "fixing" the localization
>problem, I would suggest an alternative to Tom's proposal -- switch from
>ASCII to UTF-8 for OCTET STRING objects where representation of
>multilingual text is appropriate.
>
>Summary of arguments in favor: no new objects, consistent with existing
>conforming implementations (ASCII is subset of UTF-8), doesn't introduce
>the complexity of multiple character sets for affected objects, doesn't
>introduce the complexity of changeable character sets for affected
>objects, seems to be consistent with direction of IETF generally and
>SNMP in particular.
>
>Problems I see are, briefly: forces implementations to deal with UTF-8,
>and it conflicts with existing implementations that allow non-ASCII
>characters in the strings. How serious these are depends, in part, on
>whether you believe other MIB work is going to force UTF-8 anyway, and
>how much weight you want to give to existing practice that deviates from
>the existing standard.
>
>Supporting material:
> 1. See the note from Randy Presuhn that Chris forwarded to the mailing
> list. He suggests this approach, has obviously given the topic a
> lot of thought, and discusses it in some detail. He also asserts
> that the SNMPv3 effort is headed toward use of UTF-8 for all
> human-readable strings.
> 2. I read Harald Alvestrand's message differently than Tom. I think it
> says to specify the character set (a single one) and recommends
> UTF-8; not to allow multiple character sets, chosen at the
> discretion of the agent or application.
> 3. I also read RFC 2130 (The Character Set Workshop Report) differently
> than Tom. It covers a lot of ground, trying to address migration of
> existing protocols as well as new work. For new protcols in
> particular, it says in part:
> New protocols do not suffer from the need to be compatible with
> old 7-bit pipes. New protocol specifications SHOULD use ISO
> 10646 as the base charset unless there is an overriding need to
> use a different base character set.
>
>Here are the details of the changes to the document:
>
> 1. Copy the Utf8String TC from the sysAppl draft:
>
> Utf8String ::= TEXTUAL-CONVENTION
> DISPLAY-HINT "255a"
> STATUS current
> DESCRIPTION
> "To facilitate internationalization, this TC
> represents information taken from the ISO/IEC IS
> 10646-1 character set, encoded as an octet string
> using the UTF-8 character encoding scheme described
> in RFC 2044 [**]. For strings in 7-bit US-ASCII,
> there is no impact since the UTF-8 representation
> is identical to the US-ASCII encoding."
> SYNTAX OCTET STRING (SIZE (0..255))
>
> Stylistically, you might want to introduce a ShortUtf8String with
> SIZE (0..63) -- it would simplify many of the SYNTAX clauses (see
> below).
>
> 2. Change the SYNTAX for the following objects from OCTET STRING:
>
> prtGeneralCurrentOperator Utf8String (SIZE(0..127))
> prtGeneralServicePerson Utf8String (SIZE(0..127))
> prtGeneralSerialNumber Utf8String
> prtGeneralPrinterName Utf8String
>
> prtInputMediaName Utf8String (SIZE(0..63))
> prtInputName Utf8String (SIZE(0..63))
> prtInputVendorName Utf8String (SIZE(0..63))
> prtInputModel Utf8String (SIZE(0..63))
> prtInputVersion Utf8String (SIZE(0..63))
> prtInputSerialNumber Utf8String (SIZE(0..32))
>
> prtInputMediaType Utf8String (SIZE(0..63))
> prtInputMediaColor Utf8String (SIZE(0..63))
>
> prtOutputName Utf8String (SIZE(0..63))
> prtOutputVendorName Utf8String (SIZE(0..63))
> prtOutputModel Utf8String (SIZE(0..63))
> prtOutputVersion Utf8String (SIZE(0..63))
> prtOutputSerialNumber Utf8String (SIZE(0..63))
>
> prtMarkerColorantValue Utf8String
>
> prtChannelProtocolVersion Utf8String (SIZE(0..63))
>
> prtInterpreterLangLevel Utf8String (SIZE(0..31))
> prtInterpreterLangVersion Utf8String (SIZE(0..31))
> prtInterpreterVersion Utf8String (SIZE(0..31))
>
> 3. Add the reference to RFC 2044 to the bibliography:
>
> [**] F. Yergeau, "UTF-8, a transformation format of Unicode
> and ISO 10646", RFC 2044, October 1996.
>
>That's it.
>
>:: David Kellerman Northlake Software 503-228-3383
>:: david_kellerman@nls.com Portland, Oregon fax 503-228-5662
>
>