PMP Mail Archive: Re: PMP> what happens if we use UTF-8 in place of ASCII

Re: PMP> what happens if we use UTF-8 in place of ASCII

David_Kellerman@nls.com
Fri, 25 Jul 1997 13:57:06 PST

> David, thanks for changing the name and focus of this topic. I have a
> question... does using UTF-8 put any greater burden on the agent in terms
> of size of character set and storage required? I've been trying to catch up
> with all those URGENT mail messages, I've pulled RFC 2044 and I think I
> know what UTF-8 is and why it exists... but I'm not sure what the consensus
> is regarding how much of Unicode or ISO 10646 is required behind the UTF-8.

Well code-wise, all of Unicode is behind UTF-8. But I think your
question really divides in two:

1. What's the impact if the agent is just passing UTF-8 strings back
and forth with the management application? I think here the effect
on storage requirements is "not much." ASCII codes remain the same.
There are other encodings that represent particular alphabets more
compactly, the difference for Latin texts looks relatively modest,
but there is a greater difference for some of the Asian character
sets. This is why you saw some sentiment toward increasing maximum
sizes on some objects in the latest flurry of e-mail. (I'm sure
there are code set experts who can give you real numbers.)

2. What's the impact if the agent displays these strings somewhere?
The cheap answer is "not my problem man" -- the MIB doesn't require
the agent to display any of the strings in question. Clearly, most
agents aren't going to be able to display all of a "universal"
character set like ISO 10646, so I think as a practical matter you
accept that the encoding is an interchange format, and the selection
of displayable characters is a separate implementation decision.

If the agent does intend to display the characters, the main impact
is translation (tables or code). And it's going to be proportional
to the size of your set of displayable characters. (If you display
only ASCII, translation's real cheap!) You get into some of the
Asian character sets and (as Tom has pointed out) the translation
tables get big; but put this in perspective, your glyph storage is
enormous, too.

In all of this, it looks to me as though the localization issues are a
bigger problem for the management application.

:: David Kellerman Northlake Software 503-228-3383
:: david_kellerman@nls.com Portland, Oregon fax 503-228-5662