PMP Mail Archive: PMP> Re: Clarification on earlier mail

PMP> Re: Clarification on earlier mail

Ira Mcdonald x10962 (imcdonal@eso.mc.xerox.com)
Sat, 26 Jul 1997 06:54:46 PDT

Hi Keith,

The cogent point is that the Printer MIB is NOT a new protocol.
There are significant numbers of deployed printers from a number
of manufacturers with RFC 1759 implemented, in non-English
environments, already defaulting (for the so-called 'current'
and 'console' locales) to a non-ASCII, non-UTF-8 character
set.

To change to UTF-8 now, will label a number of implementations
non-conformant, retroactively. To use ONLY UTF-8 in the future
is culturally unnacceptable to the Rank Xerox (European) and
Fuji Xerox (Asia/Pacific) marketplaces. The acceptance of
Unicode and UTF-8 (as the Character Encoding Scheme) in Japan
(specifically) is limited and the sunk investment in non-Unicode
systems and system administration tools is large and growing
daily.

The issue of character sets in SNMP MIBs is almost entirely
ignored, with the exception of moderately frequent use of
the useless 'DisplayString' (based on Telnet's NVT-ASCII,
complete with ambiguities about text line terminators)
for so-called 'human-readable' string objects.

The cost of translation from UTF-8 to JIS X0208 is exorbitant
for imbedded systems (which in Japan will probably have to
co-exist in a local network environment which uses JIS X0208
or ISO 2022-JP character sets on most host systems).

As Tom Hastings pointed out this last week on the PWG's
'PMP' mailing list, the Printer MIB is an EXISTING protocol,
with a good deal of implementation experience to learn from.
What we have learned in the last few weeks is that various
vendors chose DIFFERENT base 8-bit character sets (HP Roman 8,
PC 850, etc) for the so-called 'ASCII' strings currently
marked 'OCTET STRING' in RFC 1759. Obviously, more
coherence is required, because those systems currently
do NOT interwork (the local consoles will display different
glyphs for the same 8-bit code value).

Cheers,
- Ira McDonald (outside consultant at Xerox)
High North Inc
PO Box 221
Grand Marais, MI 49839
906-494-2434
(imcdonal@eso.mc.xerox.com)

----------------------- Keith's note ----------------------
Return-Path: <pmp-owner@pwg.org>
Received: from zombi (zombi.eso.mc.xerox.com) by snorkel.eso.mc.xerox.com (4.1/XeroxClient-1.1)
id AA16615; Fri, 25 Jul 97 16:55:29 EDT
Received: from alpha.xerox.com by zombi (4.1/SMI-4.1)
id AA09800; Fri, 25 Jul 97 16:52:13 EDT
Received: from lists.underscore.com ([199.125.85.31]) by alpha.xerox.com with SMTP id <53507(4)>; Fri, 25 Jul 1997 13:52:11 PDT
Received: from localhost (daemon@localhost) by lists.underscore.com (8.7.5/8.7.3) with SMTP id QAA12117 for <imcdonal@eso.mc.xerox.com>; Fri, 25 Jul 1997 16:48:20 -0400 (EDT)
Received: by pwg.org (bulk_mailer v1.5); Fri, 25 Jul 1997 16:46:44 -0400
Received: (from daemon@localhost) by lists.underscore.com (8.7.5/8.7.3) id QAA11985 for pmp-outgoing; Fri, 25 Jul 1997 16:44:37 -0400 (EDT)
Date: Fri, 25 Jul 1997 13:44:42 PDT
From: Chris Wellens <chrisw@iwl.com>
Reply-To: Chris Wellens <chrisw@iwl.com>
To: pmp@pwg.org
Subject: PMP> Opinion from our Area Director
Message-Id: <Pine.SUN.3.93.970725134236.1868B-100000@iwl.iwl.com>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: pmp-owner@pwg.org
Status: R

On the subject of using multiple code sets or using UTF-8, here
is the response:

---------- Forwarded message ----------
Date: Thu, 24 Jul 1997 21:24:48 -0400
From: Keith Moore <moore@cs.utk.edu>
To: Chris Wellens <chrisw@iwl.com>
Cc: Harald Alvestrand <Harald.T.Alvestrand@uninett.no>,
Keith Moore <moore@cs.utk.edu>, rpresuhn@peer.com,
Lloyd Young <lpyoung@lexmark.com>
Subject: Re: Clarification on earlier mail

Harald's on holiday, so may not see your message for awhile.
Here's my take on this:

1. All protocols that use text should label which charset they're
using. Even if there's no immediate replacement for UTF-8, it will
not last forever, and we don't want the next transition (from UTF-8 to
whatever) to be as bad as the current transition (from ASCII to other
charsets) for many internet protocols.

2. New protocols should use some form of ISO 10646 for text, probably
UTF-8. UTF-8 discriminates against East Asian countries because it
uses very long codes for codepoints associated with ideographs, but
it's much more compatible with ASCII, and not limited to the BMP.

3. Use of non-universal charsets (like ISO-8859-*) with new protocols
should perhaps be possible (since we want to have labelling at any
rate), but should not be encouraged, unless there are significant
backward compatibility issues with using UTF-8.

Keith

-----------------------------------------------------------------------------
--==--==--==- Chris Wellens
==--==--==--= Email: chrisw@iwl.com Web: http://www.iwl.com/
--==--==--==- InterWorking Labs, Inc. 244 Santa Cruz Ave, Aptos, CA 95003
==--==--==--= Tel: +1 408 685 3190 Fax: +1 408 662 9065
-----------------------------------------------------------------------------