PMP> Re: Clarification on earlier mail

Sat Jul 26 09:54:46 EDT 1997

Hi Keith,

The cogent point is that the Printer MIB is NOT a new protocol.
There are significant numbers of deployed printers from a number
of manufacturers with RFC 1759 implemented, in non-English
environments, already defaulting (for the so-called 'current'
and 'console' locales) to a non-ASCII, non-UTF-8 character
set.

To change to UTF-8 now, will label a number of implementations
non-conformant, retroactively.  To use ONLY UTF-8 in the future
is culturally unnacceptable to the Rank Xerox (European) and
Fuji Xerox (Asia/Pacific) marketplaces.  The acceptance of
Unicode and UTF-8 (as the Character Encoding Scheme) in Japan
(specifically) is limited and the sunk investment in non-Unicode
systems and system administration tools is large and growing
daily.

The issue of character sets in SNMP MIBs is almost entirely
ignored, with the exception of moderately frequent use of
the useless 'DisplayString' (based on Telnet's NVT-ASCII,
complete with ambiguities about text line terminators)
for so-called 'human-readable' string objects.

The cost of translation from UTF-8 to JIS X0208 is exorbitant
for imbedded systems (which in Japan will probably have to
co-exist in a local network environment which uses JIS X0208
or ISO 2022-JP character sets on most host systems).

As Tom Hastings pointed out this last week on the PWG's
'PMP' mailing list, the Printer MIB is an EXISTING protocol,
with a good deal of implementation experience to learn from.
What we have learned in the last few weeks is that various
vendors chose DIFFERENT base 8-bit character sets (HP Roman 8,
PC 850, etc) for the so-called 'ASCII' strings currently
marked 'OCTET STRING' in RFC 1759.  Obviously, more
coherence is required, because those systems currently
do NOT interwork (the local consoles will display different
glyphs for the same 8-bit code value).

Cheers,
- Ira McDonald (outside consultant at Xerox)
  High North Inc 
  PO Box 221
  Grand Marais, MI  49839
  906-494-2434
  (imcdonal at eso.mc.xerox.com)

----------------------- Keith's note ----------------------
Return-Path: <pmp-owner at pwg.org>
Received: from zombi (zombi.eso.mc.xerox.com) by snorkel.eso.mc.xerox.com (4.1/XeroxClient-1.1)
	id AA16615; Fri, 25 Jul 97 16:55:29 EDT
Received: from alpha.xerox.com by zombi (4.1/SMI-4.1)
	id AA09800; Fri, 25 Jul 97 16:52:13 EDT
Received: from lists.underscore.com ([199.125.85.31]) by alpha.xerox.com with SMTP id <53507(4)>; Fri, 25 Jul 1997 13:52:11 PDT
Received: from localhost (daemon at localhost) by lists.underscore.com (8.7.5/8.7.3) with SMTP id QAA12117 for <imcdonal at eso.mc.xerox.com>; Fri, 25 Jul 1997 16:48:20 -0400 (EDT)
Received: by pwg.org (bulk_mailer v1.5); Fri, 25 Jul 1997 16:46:44 -0400
Received: (from daemon at localhost) by lists.underscore.com (8.7.5/8.7.3) id QAA11985 for pmp-outgoing; Fri, 25 Jul 1997 16:44:37 -0400 (EDT)
Date: Fri, 25 Jul 1997 13:44:42 PDT
From: Chris Wellens <chrisw at iwl.com>
Reply-To: Chris Wellens <chrisw at iwl.com>
To: pmp at pwg.org
Subject: PMP> Opinion from our Area Director
Message-Id: <Pine.SUN.3.93.970725134236.1868B-100000 at iwl.iwl.com>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: pmp-owner at pwg.org
Status: R

On the subject of using multiple code sets or using UTF-8, here
is the response:

---------- Forwarded message ----------
Date: Thu, 24 Jul 1997 21:24:48 -0400
From: Keith Moore <moore at cs.utk.edu>
To: Chris Wellens <chrisw at iwl.com>
Cc: Harald Alvestrand <Harald.T.Alvestrand at uninett.no>,
    Keith Moore <moore at cs.utk.edu>, rpresuhn at peer.com,
    Lloyd Young <lpyoung at lexmark.com>
Subject: Re: Clarification on earlier mail 

Harald's on holiday, so may not see your message for awhile.
Here's my take on this:

1. All protocols that use text should label which charset they're
using.  Even if there's no immediate replacement for UTF-8, it will
not last forever, and we don't want the next transition (from UTF-8 to
whatever) to be as bad as the current transition (from ASCII to other
charsets) for many internet protocols.

2. New protocols should use some form of ISO 10646 for text, probably
UTF-8.  UTF-8 discriminates against East Asian countries because it
uses very long codes for codepoints associated with ideographs, but
it's much more compatible with ASCII, and not limited to the BMP.

3. Use of non-universal charsets (like ISO-8859-*) with new protocols
should perhaps be possible (since we want to have labelling at any
rate), but should not be encouraged, unless there are significant
backward compatibility issues with using UTF-8.

Keith

-----------------------------------------------------------------------------
--==--==--==-  Chris Wellens             
==--==--==--=  Email: chrisw at iwl.com     Web: http://www.iwl.com/
--==--==--==-  InterWorking Labs, Inc.   244 Santa Cruz Ave, Aptos, CA 95003
==--==--==--=  Tel:  +1 408 685 3190     Fax:  +1 408 662 9065
-----------------------------------------------------------------------------