PMP> what happens if we use UTF-8 in place of ASCII

PMP> what happens if we use UTF-8 in place of ASCII

Ira Mcdonald x10962 imcdonal at eso.mc.xerox.com
Fri Jul 25 17:04:33 EDT 1997


Hi Harry,


There are no 'levels' of UTF-8.  It is an encoding compatible
with US-ASCII in the 'left-half' of any octet of the FULL
UCS-2 Unicode set of 2-byte code points.  For Western European
languages, it probably averages 2.5 bytes per character.
For non-Western languages, it takes up to 4 bytes per character.


As Tom already mentioned, to map from the UTF-8 set to some
of the other Asian national sets is a 14-bit table lookup
(BIG!!), so saying the client will just cope (if the whole
rest of the local network environment is running JIS X0208)
is very naive - code conversions are not cheap, for either
the managed device or the management stations (ie, clients).


Cheers,
- Ira McDonald (outside consultant at Xerox)
  High North Inc
  PO Box 221
  Grand Marais, MI  49839
  906-494-2434


--------------------------- Harry's note -------------------------
Return-Path: <pmp-owner at pwg.org>
Received: from zombi (zombi.eso.mc.xerox.com) by snorkel.eso.mc.xerox.com (4.1/XeroxClient-1.1)
	id AA16466; Fri, 25 Jul 97 14:45:01 EDT
Received: from alpha.xerox.com by zombi (4.1/SMI-4.1)
	id AA27451; Fri, 25 Jul 97 14:41:45 EDT
Received: from lists.underscore.com ([199.125.85.31]) by alpha.xerox.com with SMTP id <53469(1)>; Fri, 25 Jul 1997 11:41:49 PDT
Received: from localhost (daemon at localhost) by lists.underscore.com (8.7.5/8.7.3) with SMTP id OAA07168 for <imcdonal at eso.mc.xerox.com>; Fri, 25 Jul 1997 14:37:59 -0400 (EDT)
Received: by pwg.org (bulk_mailer v1.5); Fri, 25 Jul 1997 14:36:27 -0400
Received: (from daemon at localhost) by lists.underscore.com (8.7.5/8.7.3) id OAA07023 for pmp-outgoing; Fri, 25 Jul 1997 14:35:16 -0400 (EDT)
From: Harry Lewis <harryl at us.ibm.com>
To: <pmp at pwg.org>
Subject: PMP> what happens if we use UTF-8 in place of ASCII
Message-Id: <5030100005326866000002L062*@MHS>
Date: Fri, 25 Jul 1997 11:36:34 PDT
Mime-Version: 1.0
Content-Type: text/plain
Sender: pmp-owner at pwg.org
Status: R


David, thanks for changing the name and focus of this topic. I have a
question... does using UTF-8 put any greater burden on the agent in terms
of size of character set and storage required? I've been trying to catch up
with all those URGENT mail messages, I've pulled RFC 2044 and I think I
know what UTF-8 is and why it exists... but I'm not sure what the consensus
is regarding how much of Unicode or ISO 10646 is required behind the UTF-8.


Harry Lewis - IBM Printing Systems




-------- Forwarded by Harry Lewis/Boulder/IBM on 07/25/97 12:27 PM -------


        pmp-owner at pwg.org
        07/24/97 06:50 PM
Please respond to pmp-owner at pwg.org @ internet


To: pmp at pwg.org @ internet
cc:
Subject: PMP> what happens if we use UTF-8 in place of ASCII


New thread; got tired of looking at that "URGENT" line.  Also, I figured
that if this note had a new title, your curiousity would get the better
of you, and you might read the note instead of automatically filing it.


Thought it would be useful to summarize the implications of switching
the ASCII OCTET STRINGs to Utf8String syntax.


Agents:
 1. Need to make sure any read-only strings that aren't ASCII use UTF-8
    encoding.  I'd guess that 99% are ASCII.
 2. Remove any 7-bit checking code.  Sounds like most agents already
    accept 8-bit codes without checking (sounds like Osicom/DPI actually
    checks -- someone knew what ASCII meant).
So most agents work unchanged, the rest with slight change.  It's worth
noting that, within the SNMP domain, the agent should be able to treat
the affected strings simply as octets.  It doesn't do anything, such as
displaying the strings, that would require it to "know" UTF-8.


Agent environment:
 1. If, outside the SNMP domain, the agent device displays the affected
    strings, there may be a need for character set conversion.


Application environment:
 1. In environments which do not use UTF-8 as the native encoding,
    convert codes to/from UTF-8.
It's worth noting that existing applications that operate in an ASCII
environment aren't affected.


As you think about "what does this mean for my management application,"
here's another thing to consider.  Existing applications that use
non-UTF-8 codes will also "work" as long as they don't run up against
another application that expects UTF-8 (or another encoding) -- as noted
above, the agent doesn't know whether the code really UTF-8 or not.  And
we ought to keep in mind that the installed base of applications
consists mostly of one vendor's management application talking to the
same vendor's printers.


In practice, this is no worse than the situation today with applications
that have used non-ASCII character encodings in these strings.  Two
applications that have opted for different code don't interoperate.  But
a non-standard application operating by itself can get away with it.


Curiously, it also is no worse than a multiple-character set solution
where two applications have chosen different character sets.  With the
multiple-character set solution, we would be in effect "standardizing"
this lack of interoperability.


::  David Kellerman         Northlake Software      503-228-3383
::  david_kellerman at nls.com Portland, Oregon        fax 503-228-5662



More information about the Pmp mailing list