IPP> Re: Unsigned integer count for attribute name keywords and

Tue May 20 15:37:21 EDT 1997

> From hastings at cp10.es.xerox.com Tue May 20 11:48:48 1997
> 
> Paul,
> 
> Bob and I were talking about the two-octet binary integer count that
> we agreed to for IPP encoding.  The following details need to be 
> carefully specified in the IPP encoding document, in order to get proper
> interoperability:
> 
> 1. The maximum value for attribute name keywords is 255 octets, since
> attribute name keywords are US ASCII.  The minimum is 1.
> From the Model document, page 43.

Did we agree to one or two bytes for the length of attribute names?  I
thought two, just in case we ever use Unicode rather than UTF-8 to
encode attribute names.  We wouldn't want the attributes to then be
limited to 127 characters.

> 
> 2. The maximum value for attribute values is 4095 characters times the 
> worst case explosion factor for UTF-8, which is 3 or 4.  In any case,
> the high order sign bit shall be 0.

The maximum length of a 2 byte Unicode character is 3 bytes in UTF-8,
A 4 byte "Unicode" character can be as much as 6 bytes. There
are currently no such characters, as far as I know.  

I don't understand why we should make that statement that the high
order sign bit needs to be 0. I would expect that it is a 16 bit
unsigned integer. In fact, I would expect that all of the lengths
are unsigned integers, be they 2 bytes or 4 bytes, but we need to
make that clear.

> 
> 3. The two-octet integer is not padded, so RISC machines that require
> integers to be aligned on 2, 4, or 8 byte boundaries, must pick up the
> two-octets whereever they are.  We don't want some clients padding the
> data to meet their server's alignment requirements and then such servers 
> not being able to accept unaligned data from other clients.  We should
> probably outlaw the ASCII NUL (decimal 0) in attribute names and values
> as well, just to make sure.  The current Model document lists the 
> abstract characters that are allowed in keywords, but the encoding document
> could specify that ASCII NUL could be introduced.  I think we need to
> specifically outlaw ASCII NUL in the encoding document.

Padding is a host issue and not a protocol issue. Since our protocol is
based on precise locations of bytes in the stream, the protocol
definition implies that there can be no padding.  Because each name and
value is specified by a length, it shouldn't matter if an ASCII NUL is
present.  In fact, I would NOT want to have two rules for termination
of a name, such as length or NULL, whichever comes first. This gets us
back into the HTTP question of Content-length versus boundary-string
and which one wins. If a name or value contains a NUL, then it must be
part of the name or value.  This rule is especially important if we
later allow other encodings such as Unicode which has numerous NUL
bytes, e.g. in every ASCII character.

> 
> 4. The first of the two octets is the most significant, as in all 
> protocols.  So-called Big Endian.
> 
> Ok?
> 
> Thanks,
> Tom
> 
>