IPP Mail Archive: Re: IPP> Re: Unsigned integer count for attribute name

Re: IPP> Re: Unsigned integer count for attribute name

Robert Herriot (Robert.Herriot@Eng.Sun.COM)
Wed, 21 May 1997 12:50:21 -0700

> From hastings@cp10.es.xerox.com Tue May 20 22:35:24 1997
>
> At 12:37 05/20/97 PDT, Robert Herriot wrote:
> >
> >> From hastings@cp10.es.xerox.com Tue May 20 11:48:48 1997
> >>
>
> To be more clear, we should outlaw ASCII NUL character in ASCII coded strings
> and outlaw the UTF-8 NUL *character* in UTF-8 strings. Otherwise, some client
> may be putting in such padding (within the count) in order to align the
> two-octet attribute name or attribute value that follow, so that the
> particular server that the client was built with could pick up the integers
> as (aligned) integers, because they were aligned in memory as the data was
> unmarshalled into memory, rather than picking up the two-octet integers as
> individual octets.
>
> Another reason to outlaw these NUL characters, is to prevent servers from
> having to filter them out before doing string compares. If one client starts
> to embed NULs and some servers filter them out, then we get interoperability
> problems with the servers that don't.
>
> In short, this is a small problem with a simple fix: Just add a sentence
> that forbids the counted ASCII and UTF-8 strings to contain the NUL character
> when representing attribute names and attribute values in IPP.

I don't agree with most of your reasons for outlawing NUL.

Padding is not an issue because we specify precisely the byte position
of each item in the protocol. Alignment is a notion that each platform
deals with as it parses the protocol; it is not a protocol notion.

Because the server has a length, it can easily compare the strings using a
a length based compare (e.g. memcmp in Unix 95). In addition, many
servers will convert the UTF-8 to some other encoding.

The real question is what are allowable characters (not encodings) for
names and values. We have talked about key-words being limited to
letters, digits, hyphen and underscore. That eliminates far more than
NUL. For values, we have the same question except the possibilities
are more numerous. We have said that values are text. Do we want to
say that there are no exceptions. Because of the format of the
protocol, an attribute could be an octet string for certain well-known
attributes. For attributes with text values, I think that it is
reasonable to eliminate certain control characters once the value is
decoded, but decoding is the key. If we later allow Unicode encoded
values in addition to UTF-8, then some octets will be NULs, even if
we eliminate the Unicode character equivalent to NUL.

Bob Herriot