PMP> what happens if we use UTF-8 in place of ASCII

Fri Jul 25 14:36:34 EDT 1997

David, thanks for changing the name and focus of this topic. I have a
question... does using UTF-8 put any greater burden on the agent in terms
of size of character set and storage required? I've been trying to catch up
with all those URGENT mail messages, I've pulled RFC 2044 and I think I
know what UTF-8 is and why it exists... but I'm not sure what the consensus
is regarding how much of Unicode or ISO 10646 is required behind the UTF-8.

Harry Lewis - IBM Printing Systems

-------- Forwarded by Harry Lewis/Boulder/IBM on 07/25/97 12:27 PM -------

        pmp-owner at pwg.org
        07/24/97 06:50 PM
Please respond to pmp-owner at pwg.org @ internet

To: pmp at pwg.org @ internet
cc:
Subject: PMP> what happens if we use UTF-8 in place of ASCII

New thread; got tired of looking at that "URGENT" line.  Also, I figured
that if this note had a new title, your curiousity would get the better
of you, and you might read the note instead of automatically filing it.

Thought it would be useful to summarize the implications of switching
the ASCII OCTET STRINGs to Utf8String syntax.

Agents:
 1. Need to make sure any read-only strings that aren't ASCII use UTF-8
    encoding.  I'd guess that 99% are ASCII.
 2. Remove any 7-bit checking code.  Sounds like most agents already
    accept 8-bit codes without checking (sounds like Osicom/DPI actually
    checks -- someone knew what ASCII meant).
So most agents work unchanged, the rest with slight change.  It's worth
noting that, within the SNMP domain, the agent should be able to treat
the affected strings simply as octets.  It doesn't do anything, such as
displaying the strings, that would require it to "know" UTF-8.

Agent environment:
 1. If, outside the SNMP domain, the agent device displays the affected
    strings, there may be a need for character set conversion.

Application environment:
 1. In environments which do not use UTF-8 as the native encoding,
    convert codes to/from UTF-8.
It's worth noting that existing applications that operate in an ASCII
environment aren't affected.

As you think about "what does this mean for my management application,"
here's another thing to consider.  Existing applications that use
non-UTF-8 codes will also "work" as long as they don't run up against
another application that expects UTF-8 (or another encoding) -- as noted
above, the agent doesn't know whether the code really UTF-8 or not.  And
we ought to keep in mind that the installed base of applications
consists mostly of one vendor's management application talking to the
same vendor's printers.

In practice, this is no worse than the situation today with applications
that have used non-ASCII character encodings in these strings.  Two
applications that have opted for different code don't interoperate.  But
a non-standard application operating by itself can get away with it.

Curiously, it also is no worse than a multiple-character set solution
where two applications have chosen different character sets.  With the
multiple-character set solution, we would be in effect "standardizing"
this lack of interoperability.

::  David Kellerman         Northlake Software      503-228-3383
::  david_kellerman at nls.com Portland, Oregon        fax 503-228-5662