PMP Mail Archive: Re: PMP> Revised proposal on definition of OCTET STRING to allow

Re: PMP> Revised proposal on definition of OCTET STRING to allow

David_Kellerman@nls.com
Tue, 22 Jul 1997 15:06:57 PST

> Here is my third message to help everyone make an informed
> decision.
>
> Since our Area Directors have a lot of experience
> with character sets and applications, we asked them if they had
> any suggestions and advice. Harald A. responded, and he did
> strongly encourage us to define the character set. (See below.)
>
> When we reviewed these suggestions and advice against the
> original localization proposal (late June), there did not seem
> to be any way to make it all work. Now it seems possible that
> Tom Hastings' latest proposal would be a satisfactory compromise
> that incorporates this advice and has minimal impact to the MIB.
>
> So we have to ask everyone to think through the technical issues
> and see if this is the case.

This is going to be a bit repetitive (a no-no in e-mail, I know), but
this issue seems to create a lot of confusion.

To my mind, what Tom is proposing is very different from:
1. Randy Presun's e-mail and the SYSAPPL MIB approach to character sets
2. Harald Alvestrand's e-mail
3. The RFC 2130 (Character Set Workshop Report) recommendations
The way I read all these sources, they essentially say to use ISO 10646
(roughly UNICODE worked over by ISO, for those of you still getting your
bearings) as the base character set and UTF-8 as the character encoding
scheme (again, roughly speaking, encodes ISO 10646 codes as multi-byte
sequences, seven-bit single-byte codes happen to match ASCII).

Tom's approach, and similarly the approach taken with the
prtLocalizationCharacterSet MIB object, allows multiple character sets
and encodings. You need to know the encoding to interpret the codes;
one code represent different characters in different encodings. In
Tom's proposal, the determination of encoding takes place outside the
MIB.

Now these two approaches are not the same, by a long shot. And it's my
understanding that in other places, proponents of opposing sides line up
with armor and broadsword to debate the issue. Being an applications
software person, I happen to prefer the UTF-8 approach. Now I'm not a
licensed character set professional, I've misplaced my broadsword, and
my armor doesn't fit anymore, so I'm feeling a little handicapped in the
debate.

So, Chris, I know you'd like to find a "satisfactory compromise" here,
but I don't see where you've got convergence of positions. (Between
your advisors and Tom's proposal, in particular.) Perhaps Tom
would like to propose that all the strings now constrained as ASCII be
allowed to contain UTF-8 codes?

:: David Kellerman Northlake Software 503-228-3383
:: david_kellerman@nls.com Portland, Oregon fax 503-228-5662