PMP Mail Archive: Re: PMP> URGENT: SYNTHESIS proposal on definition of OCTET STRING to

Re: PMP> URGENT: SYNTHESIS proposal on definition of OCTET STRING to

JK Martin (jkm@underscore.com)
Wed, 23 Jul 1997 19:36:28 -0400 (EDT)

I think Dave Kellerman's new proposal may very well be the best
way to solve this last-minute crisis in a decent way.

As I understand Dave's proposal, instead of having a fixed charset
of ASCII, we simply move it to UTF-8, and thereby gain a reasonable
amount of localization with absolutely minimal impact to the MIB.

Also in support of Dave's proposal--and this means a lot to us as
mgmt app developers--we stand a much better chance of having decent
interoperability if we constrain the charset to UTF-8.

I hope the PMP group accepts this new proposal for UTF-8, assuming
of course no one pokes a big fat hole in it. ;-)

...jay

----- Begin Included Message -----

Date: Wed, 23 Jul 1997 15:46:51 PST
From: David_Kellerman@nls.com
To: pmp@pwg.org
Subject: Re: PMP> URGENT: SYNTHESIS proposal on definition of OCTET STRING to
allow superset of ASCII

If there really is a broad interest in "fixing" the localization
problem, I would suggest an alternative to Tom's proposal -- switch from
ASCII to UTF-8 for OCTET STRING objects where representation of
multilingual text is appropriate.

Summary of arguments in favor: no new objects, consistent with existing
conforming implementations (ASCII is subset of UTF-8), doesn't introduce
the complexity of multiple character sets for affected objects, doesn't
introduce the complexity of changeable character sets for affected
objects, seems to be consistent with direction of IETF generally and
SNMP in particular.

Problems I see are, briefly: forces implementations to deal with UTF-8,
and it conflicts with existing implementations that allow non-ASCII
characters in the strings. How serious these are depends, in part, on
whether you believe other MIB work is going to force UTF-8 anyway, and
how much weight you want to give to existing practice that deviates from
the existing standard.

Supporting material:
1. See the note from Randy Presuhn that Chris forwarded to the mailing
list. He suggests this approach, has obviously given the topic a
lot of thought, and discusses it in some detail. He also asserts
that the SNMPv3 effort is headed toward use of UTF-8 for all
human-readable strings.
2. I read Harald Alvestrand's message differently than Tom. I think it
says to specify the character set (a single one) and recommends
UTF-8; not to allow multiple character sets, chosen at the
discretion of the agent or application.
3. I also read RFC 2130 (The Character Set Workshop Report) differently
than Tom. It covers a lot of ground, trying to address migration of
existing protocols as well as new work. For new protcols in
particular, it says in part:
New protocols do not suffer from the need to be compatible with
old 7-bit pipes. New protocol specifications SHOULD use ISO
10646 as the base charset unless there is an overriding need to
use a different base character set.

Here are the details of the changes to the document:

1. Copy the Utf8String TC from the sysAppl draft:

Utf8String ::= TEXTUAL-CONVENTION
DISPLAY-HINT "255a"
STATUS current
DESCRIPTION
"To facilitate internationalization, this TC
represents information taken from the ISO/IEC IS
10646-1 character set, encoded as an octet string
using the UTF-8 character encoding scheme described
in RFC 2044 [**]. For strings in 7-bit US-ASCII,
there is no impact since the UTF-8 representation
is identical to the US-ASCII encoding."
SYNTAX OCTET STRING (SIZE (0..255))

Stylistically, you might want to introduce a ShortUtf8String with
SIZE (0..63) -- it would simplify many of the SYNTAX clauses (see
below).

2. Change the SYNTAX for the following objects from OCTET STRING:

prtGeneralCurrentOperator Utf8String (SIZE(0..127))
prtGeneralServicePerson Utf8String (SIZE(0..127))
prtGeneralSerialNumber Utf8String
prtGeneralPrinterName Utf8String

prtInputMediaName Utf8String (SIZE(0..63))
prtInputName Utf8String (SIZE(0..63))
prtInputVendorName Utf8String (SIZE(0..63))
prtInputModel Utf8String (SIZE(0..63))
prtInputVersion Utf8String (SIZE(0..63))
prtInputSerialNumber Utf8String (SIZE(0..32))

prtInputMediaType Utf8String (SIZE(0..63))
prtInputMediaColor Utf8String (SIZE(0..63))

prtOutputName Utf8String (SIZE(0..63))
prtOutputVendorName Utf8String (SIZE(0..63))
prtOutputModel Utf8String (SIZE(0..63))
prtOutputVersion Utf8String (SIZE(0..63))
prtOutputSerialNumber Utf8String (SIZE(0..63))

prtMarkerColorantValue Utf8String

prtChannelProtocolVersion Utf8String (SIZE(0..63))

prtInterpreterLangLevel Utf8String (SIZE(0..31))
prtInterpreterLangVersion Utf8String (SIZE(0..31))
prtInterpreterVersion Utf8String (SIZE(0..31))

3. Add the reference to RFC 2044 to the bibliography:

[**] F. Yergeau, "UTF-8, a transformation format of Unicode
and ISO 10646", RFC 2044, October 1996.

That's it.

:: David Kellerman Northlake Software 503-228-3383
:: david_kellerman@nls.com Portland, Oregon fax 503-228-5662

----- End Included Message -----