IPP>PRO: sorry, binary is better (?)

Thu Jun 19 23:10:16 EDT 1997

This is a multi-part message in MIME format.
--------------60E2218BCDA1D1AA31F6DF91
Content-Type: text/plain; charset=us-ascii
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Transfer-Encoding: 7bit

The attached code will handle free-form input of the tokens wherein you
can have as
much whitespace between tokens as you want (not limited to just 1 space
character between tokens.). And I'm not even using any runtime library
routines, just
the 'isspace' macro from ctype.h. And I provided a null character at the
end
of the attribute name for doing subsequent string handling functions on
the attribute
name if you need to.

This stuff is really too easy and we shouldn't worry about the
differences between
ASCII and binary at this point. The code difference is trivial.
I thought we have already made this decision?

Randy

Robert Herriot wrote:

> Here is my version of GrabAttr for the protocol we chose on June 17th.
>
> I think that it is competitive with the function you defined for the
> binary protocol. Also, my ANSI C book states that atoi assumes
> decimal, but
> recommends using strtol as I have done below.
>
> int GrabAttr(ATTR *pAttr)
> {
>   char * pNext;
>   int length;
>
>   pNext = strchr(pBuf,' ');
>   pAttr->nNameLength = pNext-pBuf;
>   pAttr->pName = pBuf
>   pBuf = pNext + 1;
>
>   length = strtol(pBuf,&pNext,10);  /* ANSI C function */
>   pAttr->nValLen = length;
>   pNext++;
>   pAttr->pVal = pNext;
>   pBuf = pNext + length;
>   return ERROR_OK
> }
>
> As for wanting integer tokens for internal processing of keywords, I
> would expect than an implementation might have a keywordToInt function
>
> which would map keywords to integers that in turn could be used in
> switch statements. So the strncmp/hashing issues would be kept in
> the keywordToInt function.
>
> Comments?
>
> Bob Herriot
>
> > From SBUTLER at hpbs2024.boi.hp.com Thu Jun 19 18:06:36 1997
> >
> > I just sent this to the IPP reflector, but forgot to put the CC in
> to
> > you folk.  It appears customery to address changes to the most
> > recent active participants, and I thought you deserved a heads-up.
> >
> > I believe I gathered correct e-mail addresses for all attendees on
> > June17th, please forward on to anyone that should see it directly
> > rather than the reflector copy.
> >
> > ------- Forwarded Message Follows -------
> >
> > I'm sorry for the length of this missive (about 150 lines plus
> > headers...) but it seems necessary.
> >
> > Last night I revised Paul's document to indicate what we had
> concluded
> > on the 17th, and this morning I woke up way too early...  Or perhaps
> it
> > was way too late, depending on your perspective.
> >
> > A binary encoding is MUCH simpler.  Even if limited to just the
> lengths.
> > For example, with 16-bit binary lengths (FAIRLY COMPLETE CODE):
> >
> > // assumes enough incoming buffer to hold entire name and ValLen
> fields
> > // requires external help to deal with long or multiple values
> > int GrabAttr(ATTR *pAttr)
> > {
> >   nLength=ntohs((unsigned short)*(U16 *)pBuf);
> >   pBuf+=2;
> >   pAttr->nNameLength=nLength;
> >   pAttr->pName=pBuf;
> >   pBuf+=nLength;
> >   nLength=ntohs((unsigned short)*(U16 *)pBuf);
> >   pBuf+=2;
> >   pAttr->nValLen=nLength;
> >   pAttr->pVal=pBuf;
> >   pBuf+=nLength;
> >   return ERROR_OK;
> > }
> >
> > vs. a n-digit ASCII length (NOT CODE, not even pseudo-c):
> >
> > // assumes enough incoming buffer to hold entire name and ValLen
> fields
> > // requires external help to deal with long or multiple values
> > int GrabAttr(ATTR *pAttr)
> > {
> >   // str functions don't work because we aren't null-terminated
> >   so we normalize all input buffers?  (null terminate ... ?)
> >   // or just write our own strtok?
> >   strtok(ON A COPY of the string)
> >     // so we can remember the terminating char, either <SP> or
> <CR>...
> >   pAttr->pName=pToken;
> >   pBuf+=strlen(pToken)+1;
> >   strtok()
> >   pBuf+=strlen(pToken)+1;
> >   nLength=private_atoi (because leading 0's would assume octal)
> >   pAttr->nValLen=nLength;
> >   pAttr->pVal=pBuf;
> >   pBuf+=nLength;
> >   return pBuf;
> > }
> > int private_atoi(char *)
> > {
> >   do it
> > }
> > void normalize() // or private_strtok
> > {
> >   do it
> > }
> >
> > Both of these examples will require a bunch of strncmp's in order to
>
> > actually do anything, because what I've drawn up isn't even as
> binary
> > as Paul and I had in the SWP documents (even June 6).  If you reduce
>
> > Operation and Attribute names to an enum'd set then all those
> strcmp's
> > go away and become a simple '==' or even a switch (YES!).
> >
> > I think we need to face reality here...  Binary requires
> significantly
> > less code to deal with, which means less bugs and less testing,
> which
> > means more solid implementations sooner.  (Anybody ever run into a
> web
> > site that used atoi and interpreted numbers as octal?  I was just
> reading
> > about one last week...).  What do we get with ASCII?  My list of
> "pros"
> > from the meeting is really short.  The biggest I've been able to
> > identify is vendor extensibility.
> >
> > For extensibility we could reserve the upper 0x8000 (or maybe
> fewer).
> > In fact, for attributes we could assign the "all ones" enum and have
> the
> > vendor use the first 4+ bytes of the value as their unique attribute
>
> > name/ID in order to minimize collisions.
> >
> > Unless someone has parsing for ASCII all worked out and can
> illustrate
> > that it really isn't much more code, I have to be in the binary
> camp.
> > IBM's triplets anyone?  (Roger?)
> >
> > The following three examples show an encoding of a
> > Print-Job operation with attributes
> > job-name=="Spec"
> > job-originator=="Sylvan", and
> >   (a multi-value vendor specific hypothetical attribute)
> > vendorHWP-BLD-ID=="Alpha",32766
> >
> > In all examples the bytes on the wire are specified in hex (two
> > characters) or ASCII (one character).  The spaces between bytes and
> the
> > line wrapping are not transmitted.  {print data} is the actual data
> for
> > the job and the literal phrase is not transmitted.
> >
> > -----------
> > Example 1, ASCII (June17):
> >
> >  0  1  0  0  P  r  i  n  t  -  J  o  b 0d 0a
> >  j  o  b  -  n  a  m  e 20  4 20  S  p  e  c  j  o  b  -  o  r  i
> g  i  n  a
> >  t  o  r 20  6 20  S  y  l  v  a  n  H  W  P  -  B  L  D  -  I  D
> 20  5 20  A
> >  l  p  h  a 20  5 20  3  2  7  6  6 0d 0a
> > {print data}
> >
> > -----------
> > Example 2, ASCII with binary (version is fixed) lengths:
> >
> >  0  1  0  0 00 09  P  r  i  n  t  -  J  o  b 00 08
> >  j  o  b  -  n  a  m  e 00 04  S  p  e  c 00 0e  j  o  b  -  o  r
> i  g  i  n
> >  a  t  o  r 00 06  S  y  l  v  a  n 00 0a  H  W  P  -  B  L  D  -
> I  D 00 05
> >  A  l  p  h  a 00 00 00 05  3  2  7  6  6
> > {print data}
> >
> > -----------
> > Example 3, binary (SWP, IBM's triplets...)
> > (extra spaces, line wraps, and comments added for clarity):
> >
> > 00 01 00 00                            ; version 01.00
> > 00 01                                  ; Print-Job
> > 00 01                                  ; job-name
> > 00 04  S  p  e  c                      ; length and value
> > 00 02                                  ; job-originator
> > 00 06  S  y  l  v  a  n                ; length and value
> > ff ff                                  ; vendor--see 4+ bytes of
> value
> > 00 0a  H  W  P  -  B  L  D  -  I  D    ; length and value1
> > 00 00                                  ; additional value for prev
> attribute
> > 00 05  A  l  p  h  a                   ; length and value2
> > 00 00                                  ; additional value
> > 00 02 7f fe                            ; length and value3
> > {print data}
> >
> > -----------
> >
> > Going from example1 to example2 eliminates scanning and tokenizing.
> >
> > Going from example2 to example3 eliminates strcmp's to match names
> > of attributes and operations.
> >
> > I vote to go all binary, eg. #3.  (Note though, that I encoded the
> HP
> > attribute name in ASCII whereas in real life I'd probably pick more
> like:
> >   H W P xx yy zz
> > where hex yyzz would increment but the rest would remain static for
> all
> > the attributes that I create.  I probably wouldn't do the
> multi-valued
> > attribute either, rather I'd just append onto the first value since
> the
> > prefix length is fixed.)
> >
> > I apologize for having to do this, but evidently my implementor hat
> > wasn't fitting very well on June 17th.
> >
> > Thanks for your attention,
> >
> > sdb
> >
> >  | Sylvan Butler | sbutler at boi.hp.com | AreaCode 208 Phone/TelNet
> 396-2282 |
> >

--------------60E2218BCDA1D1AA31F6DF91
Content-Type: text/plain; charset=us-ascii; name="ippexam.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="ippexam.txt"

You have to have tokenizing capability in an implementation anyway, to support the HTTP headers and chunking...

typedef struct {
                 char *attrName;
                 int  attrLength;
                 char *attrValue;
               } ATTR;

/* Most optimizing compilers would attempt to inline the next_delim/next_token     functions for performance */
char *next_delim(char *stream)
{
   while (!isspace(*stream)) ++stream;

   return(stream);
}

char *next_token(char *stream)
{
   while (isspace(*stream)) ++ stream;

    return(stream);
}

/* This routine assumes that the current position in the input or buffer stream is placed at the first character of the attribute name */

int GrabAttr(ATTR *pAttr, char *inpstream)
{
   int sts = OK;

  pAttr->attrName = inpstream;
  inpstream = next_delim(inpstream);
  *inpstream++ = '\0';
  inpstream = next_token(inpstream);
  for (pAttr->attrLength = 0; (isdigit(*inpstream)); inpstream++)
    pAttr->attrLength = (pAttr->pAttrLength * 10) + (*inpstream -'0');
  pAttr->attrValue = next_token(inpstream);

   return(sts);
}

--------------60E2218BCDA1D1AA31F6DF91--