PMP> what happens if we use UTF-8 in place of ASCII

PMP> what happens if we use UTF-8 in place of ASCII

David_Kellerman at nls.com David_Kellerman at nls.com
Fri Jul 25 17:57:06 EDT 1997


> David, thanks for changing the name and focus of this topic. I have a
> question... does using UTF-8 put any greater burden on the agent in terms
> of size of character set and storage required? I've been trying to catch up
> with all those URGENT mail messages, I've pulled RFC 2044 and I think I
> know what UTF-8 is and why it exists... but I'm not sure what the consensus
> is regarding how much of Unicode or ISO 10646 is required behind the UTF-8.


Well code-wise, all of Unicode is behind UTF-8.  But I think your
question really divides in two: 


 1. What's the impact if the agent is just passing UTF-8 strings back
    and forth with the management application?  I think here the effect
    on storage requirements is "not much."  ASCII codes remain the same. 
    There are other encodings that represent particular alphabets more
    compactly, the difference for Latin texts looks relatively modest, 
    but there is a greater difference for some of the Asian character
    sets.  This is why you saw some sentiment toward increasing maximum
    sizes on some objects in the latest flurry of e-mail.  (I'm sure
    there are code set experts who can give you real numbers.) 


 2. What's the impact if the agent displays these strings somewhere? 
    The cheap answer is "not my problem man" -- the MIB doesn't require
    the agent to display any of the strings in question.  Clearly, most
    agents aren't going to be able to display all of a "universal"
    character set like ISO 10646, so I think as a practical matter you
    accept that the encoding is an interchange format, and the selection
    of displayable characters is a separate implementation decision. 


    If the agent does intend to display the characters, the main impact
    is translation (tables or code).  And it's going to be proportional
    to the size of your set of displayable characters.  (If you display
    only ASCII, translation's real cheap!)  You get into some of the
    Asian character sets and (as Tom has pointed out) the translation
    tables get big; but put this in perspective, your glyph storage is
    enormous, too. 


In all of this, it looks to me as though the localization issues are a
bigger problem for the management application. 


::  David Kellerman         Northlake Software      503-228-3383
::  david_kellerman at nls.com Portland, Oregon        fax 503-228-5662



More information about the Pmp mailing list