IPP>PRO: sorry, but binary is better

Thu Jun 19 21:41:08 EDT 1997

I'm sorry for the length of this missive (about 150 lines plus
headers...) but it seems necessary.

Last night I revised Paul's document to indicate what we had concluded
on the 17th, and this morning I woke up way too early...  Or perhaps it
was way too late, depending on your perspective.

A binary encoding is MUCH simpler.  Even if limited to just the lengths.
For example, with 16-bit binary lengths (FAIRLY COMPLETE CODE):

// assumes enough incoming buffer to hold entire name and ValLen fields
// requires external help to deal with long or multiple values
int GrabAttr(ATTR *pAttr)
{
  nLength=ntohs((unsigned short)*(U16 *)pBuf);
  pBuf+=2;
  pAttr->nNameLength=nLength;
  pAttr->pName=pBuf;
  pBuf+=nLength;
  nLength=ntohs((unsigned short)*(U16 *)pBuf);
  pBuf+=2;
  pAttr->nValLen=nLength;
  pAttr->pVal=pBuf;
  pBuf+=nLength;
  return ERROR_OK;
}

vs. a n-digit ASCII length (NOT CODE, not even pseudo-c):

// assumes enough incoming buffer to hold entire name and ValLen fields
// requires external help to deal with long or multiple values
int GrabAttr(ATTR *pAttr)
{
  // str functions don't work because we aren't null-terminated
  so we normalize all input buffers?  (null terminate ... ?)
  // or just write our own strtok?
  strtok(ON A COPY of the string)
    // so we can remember the terminating char, either <SP> or <CR>...
  pAttr->pName=pToken;
  pBuf+=strlen(pToken)+1;
  strtok()
  pBuf+=strlen(pToken)+1;
  nLength=private_atoi (because leading 0's would assume octal)
  pAttr->nValLen=nLength;
  pAttr->pVal=pBuf;
  pBuf+=nLength;
  return pBuf;
}
int private_atoi(char *)
{
  do it
}
void normalize() // or private_strtok
{
  do it
}

Both of these examples will require a bunch of strncmp's in order to
actually do anything, because what I've drawn up isn't even as binary
as Paul and I had in the SWP documents (even June 6).  If you reduce
Operation and Attribute names to an enum'd set then all those strcmp's
go away and become a simple '==' or even a switch (YES!).

I think we need to face reality here...  Binary requires significantly
less code to deal with, which means less bugs and less testing, which
means more solid implementations sooner.  (Anybody ever run into a web
site that used atoi and interpreted numbers as octal?  I was just reading
about one last week...).  What do we get with ASCII?  My list of "pros"
from the meeting is really short.  The biggest I've been able to
identify is vendor extensibility.

For extensibility we could reserve the upper 0x8000 (or maybe fewer).
In fact, for attributes we could assign the "all ones" enum and have the
vendor use the first 4+ bytes of the value as their unique attribute
name/ID in order to minimize collisions.

Unless someone has parsing for ASCII all worked out and can illustrate
that it really isn't much more code, I have to be in the binary camp.
IBM's triplets anyone?  (Roger?)

The following three examples show an encoding of a
Print-Job operation with attributes 
job-name=="Spec"
job-originator=="Sylvan", and
  (a multi-value vendor specific hypothetical attribute) 
vendorHWP-BLD-ID=="Alpha",32766

In all examples the bytes on the wire are specified in hex (two
characters) or ASCII (one character).  The spaces between bytes and the
line wrapping are not transmitted.  {print data} is the actual data for
the job and the literal phrase is not transmitted.

-----------
Example 1, ASCII (June17):

 0  1  0  0  P  r  i  n  t  -  J  o  b 0d 0a
 j  o  b  -  n  a  m  e 20  4 20  S  p  e  c  j  o  b  -  o  r  i  g  i  n  a
 t  o  r 20  6 20  S  y  l  v  a  n  H  W  P  -  B  L  D  -  I  D 20  5 20  A
 l  p  h  a 20  5 20  3  2  7  6  6 0d 0a
{print data}

-----------
Example 2, ASCII with binary (version is fixed) lengths:

 0  1  0  0 00 09  P  r  i  n  t  -  J  o  b 00 08
 j  o  b  -  n  a  m  e 00 04  S  p  e  c 00 0e  j  o  b  -  o  r  i  g  i  n
 a  t  o  r 00 06  S  y  l  v  a  n 00 0a  H  W  P  -  B  L  D  -  I  D 00 05
 A  l  p  h  a 00 00 00 05  3  2  7  6  6
{print data}

-----------
Example 3, binary (SWP, IBM's triplets...)
(extra spaces, line wraps, and comments added for clarity):

00 01 00 00                            ; version 01.00
00 01                                  ; Print-Job
00 01                                  ; job-name
00 04  S  p  e  c                      ; length and value
00 02                                  ; job-originator
00 06  S  y  l  v  a  n                ; length and value
ff ff                                  ; vendor--see 4+ bytes of value
00 0a  H  W  P  -  B  L  D  -  I  D    ; length and value1
00 00                                  ; additional value for prev attribute
00 05  A  l  p  h  a                   ; length and value2
00 00                                  ; additional value
00 02 7f fe                            ; length and value3
{print data}

-----------

Going from example1 to example2 eliminates scanning and tokenizing.

Going from example2 to example3 eliminates strcmp's to match names
of attributes and operations.

I vote to go all binary, eg. #3.  (Note though, that I encoded the HP
attribute name in ASCII whereas in real life I'd probably pick more like:
  H W P xx yy zz
where hex yyzz would increment but the rest would remain static for all
the attributes that I create.  I probably wouldn't do the multi-valued
attribute either, rather I'd just append onto the first value since the
prefix length is fixed.)

I apologize for having to do this, but evidently my implementor hat
wasn't fitting very well on June 17th.

Thanks for your attention,

sdb

 | Sylvan Butler | sbutler at boi.hp.com | AreaCode 208 Phone/TelNet 396-2282 |