Randy Turner's very good document on an application/ipp format over HTTP
1.1 triggered an important concern on my part. This concern was
I understand the simplicity of having everything be attribute value
pairs with a character attribute name of a given length and a (string)
value of a given length. But, I think that this is an over-
simplification of the future of IPP.
The Problem Statement
To avoid vague generalizations, lets consider an example that I believe
is likely to arise in the near future. On attribute one might want to
add is an "address". For example, one might have an address to which the
final output is to be mailed or shipped. One might also have an address
to which the bill for reproduction is to be sent. This brings our first
problem because, quite often, these two addresses are not the same.
That means that we need two different address attributes.
The second problem with addresses is that an address is a structured
entity. It has things like a mailstop, a street number, a street name, a city
sub-region identifier, a city identifier, a state and/or country
identifier and, typically, some kind of ZIPcode. These are assembled
into an address, but they are assembled in different ways in different
cultures and countries. This means that treating the address as a single
character string makes it very difficult to accurately recover the
information in the address. It makes more sense to store the address as
a structured objects with attributes and values for each of the
component parts. But, the address (as a whole) is the value of the
shipping address or billing address attribute identified above. The
current proposal for IPP does not allow a value to be structured.
Well, that is not quite true, there is one simple kind of structuring
for attributes whose value is a list of values. The elements of the
value list are identified by being introduced with a zero length
attribute name. This, however, will not help in the address example
unless one specifies a fixed order for the component parts. History has
shown the positional syntaxes are much more prone to breakage than
keyword syntaxes, especially where many of the components are optional
(as is the case with addresses).
A Proposed Solution
The proposed solution to the extensibility problem is to add a type byte
(or half word) to the value portion of the value portion of the
attribute-value pair. This would change the syntax in Randy's recent
draft as follows:
attribute = name-length name value-type value-length value
value-type = one-byte integer ; a registered type value
value-length = three-byte integer ; number of octets in value
value = octet-string
Note that the length was increased to three bytes to allow for larger
structured values and was (arbitrarily) made three bytes so that the
combination of value-type and value-length takes four bytes.
It is proposed that there be a registry of value types. The first two
entries in that registry would be (zero is reserved)
1: Unicode string in UTF8 encoding (as specified in the draft)
2: list of values (here the length of the value field determines how
many values are present. The length, however, is not the number of
value, but the number of bytes consumed by the values.
This covers the cases defined in Randy's draft. Possible additions in
the future might be to add a type for binary numbers (integers or
possibly float in IEEE format) and for "dictionaries" which would be
nested sets of attribute-value pairs such as required for the address
It is also the case that this scheme immediately provides hierarchy in
the value space. For example, in the list of values case, any of the
values in the list could be another list, and so forth. I would agree
that we may wish to restrict the hierarchy to one level in the first
specification, but I believe that we should provide the capability to
allow hierarchy in future versions of the IPP protocol specification.
I will agree that it is not necessary to introduce a value-type to solve
the extensibility problem that I posed above. I believe, however, that
it is the simplest, most robust way to provide extensibility.
Some other solutions are:
Claim that it never makes sense to structurally subdivide a
value. This would mean that addresses would just be strings and it
would be up to the recipient to figure out how to parse the
string. One could register rules for parsing strings and associate
these rules with attribute names. As noted, the rules for parse an
arbitrary address may not be definable, so it might be necessary to
structure the attribute name so that it has the rule for how to parse
the value; this might be done sort of like the way the URLencode form
data is handled, but with substring ranges to be used on the value
string in place of the value data.
Allow values like addresses to be structured but require the name to
have the structuring information. In the example above there would be
separate attribute names for billingAddress.mailstop and
shippingAddress.mailstop and so forth for all the component part of an
address. With this solution one asks whether all the parts of a
structured attribute must occur in sequence or may they occur in any
order in the attribute list.
Work with interchange formats such as TIFF, PostScript and PDF have
shown the value of having a type identifier for values. It provides
extensibility and allows more efficient representations of the data. I,
therefore, think that a value-type byte should be built into the
application/ipp encoding format.
By the way, it does not matter whether the value-type precedes or
follows the value-length (as long as it is clear that the value-type is
not counted in the value-length). Nor am I sure that a three-byte
value-length is the correct value. What is important is having a
value-type field at all.