IPP Mail Archive: Re: IPP>PRO comments on CPAP

Re: IPP>PRO comments on CPAP

JK Martin (jkm@underscore.com)
Fri, 9 May 1997 01:37:09 -0400 (EDT)

Bob,

Thanks for getting the discussion rolling. This post is a bit long,
but hopefully reads very quickly. After reading this message, one
should be able to fully understand CPAP in the context of HTTP-like
protocol operations without having to read the CPAP spec:

ftp://ftp.pwg.org/pub/pwg/ipp/new_PRO/cpap.ps

> In CPAP a client opens a connection (e.g TCP/IP) and sends a series
> of operations. The syntax for CPAP (simplified) is:
> Integer-opcode Sequence-Number Length blank Data

The Opcode, Sequence-Number and Length fields are all displayable text.
There is no binary data involved. (Just so folks are clear on this.)

Also, "Sequence-Number" is more like "Transaction-Id", similar to that
used in SNMP and other protocols. The client is free to set that number
to any value of its choice, since it's primary purpose is to identify
the associated reply message.

Speaking of request and reply (response) messages:

One of the interesting aspects of CPAP is that there is only ONE type
of response message: Reply

Every request message has a corresponding Reply, but the same message
format is used for all responses. Unlike some protocols that define
messages such as:

Foo-Request, Foo-Response
More-Request, More-Response
...etc

CPAP instead has a *very* simple data model that requires only a
single type of response message.

While the vast bulk of activity between the client and server can be
totally synchronous, the design of CPAP allows a client to post several
requests, then receive the responses. The client keys off the Reply's
sequence number (we call it "message id" at Underscore) to know which
request the Reply is for.

Other key aspects about the protocol design:

* All requests utilize a simple command (opcode) and zero or
more "variables" (named strings); like HTTP, there are no
fixed formats for the message definitions, thereby allowing
for arbitrary extensibility.

* All message data is displayable text; however the payload of
the message can be arbitrary binary data; note, though, that
none of the defined messages contains any kind of binary data.

* Control messages and document data can exist either entirely
separately (via separate channels), or co-exist within a single
channel; in the single channel case, the document data is passed
in multiple DATA messages.

* When a separate data channel is used to convey a document,
neither the client NOR the server must parse any of the data;
when the client closes the data channel connection, the server
knows the document data is complete.

The concept of separate control and data channels was directly lifted
from the experience learned from FTP. It works. And it works really
well:

* The client can stream data down the data channel without having
to know how much data is involved, or have to deal with "boundary"
delimiters.

* Since there is absolutely NO framing involved, performance can
reach the theoretical maximum between the client and server.

* Separate channels allows for some very interesting (and useful)
implementations whereby separate processes can handle the control
and data channels in a very effective manner.

Bob, you're a Unix person. On that last item, think about the classic
BSD LPD implementation. The "if" filter can simply stream its document
data at the printer without getting at all involved in the session/job
negotiation. Very fast. Very simple. Very cheap.

Here's another view of the "protocol ladder" between the client and server
for a typical job submission transaction:

Line Client Side Server Side
-----------------------------------------------
1 Open connection --->
2 <--- Accept connection
3 Start Session --->
4 <--- Reply
5 Start Job --->
6 Start Document --->
7 <--- Reply
8 Open data channel --->
9 <--- Accept connection
10 Send document data --->
11 .
12 .
13 .
14 Close data channel --->
15 End Job --->
16 <--- Reply
17 Close connection --->

Notes (reference Line number):

1 - Stream-level (TCP) connection.
2 -
3 - Client identifies itself.
4 - Server accepts, and provides information about itself; or rejects,
(via a NAK message) providing a reason for rejection (no access, no
resources, etc), at which point the connection is closed by the server.
5 - Provides identification about the requesting user and the job itself
6 - Client specifies this document's attributes, such as PDL, etc.
7 - Server returns the port number to which the client should connect to
deliver the document data.
8 - Client connects to the the server-assigned data channel port number.
9 -
10 - Data is pumped down the wire as fast as the client can send it; the
server receives the data as fast as it can, letting TCP handle flow
control.
11 -
12 -
13 -
14 - Client has delivered all document data, so closes the connection; the
server sees this, and thus knows it now has all the document data.

The sequence of steps 6 thru 14 repeat for each document in
the job.

15 - Client tells the server this job is completed.
16 - Server responds with the complete set of accounting data for the job.
17 -

> There are many operations and they include ones for management.

The management-related protocol operations deserve serious considerations,
but they shouldn't necessarily be considered at this point.

> I like some ideas and not others in CPAP.
>
> On the downside CPAP opens a new circuit for sending document data for
> each document (for Level II). Level I does not, but the document advises
> the use of Level II. This seems to have all the HTTP problems of too much
> TCP build/ tear-down traffic.

CPAP uses far fewer TCP connections than that proposed in the most recent
drafts. Moreover, those connections significantly simplify the protocol
interactions, while at the same time requiring a far less complex parsing
and control implementation.

> On the good side, CPAP looks slightly more compact because of its
> integer opcode and length. But the biggest savings in terms of
> processing may be that the server knows that the operations are
> in the context of the session and doesn't have to re-establish context
> with each operation. For example, out SendJob currently contains
> the job-URL. This seems redundant if the server knows from the preceding
> CreateJob what job it is expecting to receive data.

No one should underestimate the value and power of maintaining a single,
monolithic control channel to effect a client-server transaction. Perhaps
more importantly, though is the SIMPLICITY in the implementation, which
translates to smaller, faster and more robust products.

> My take is that there are some good ideas in CPAP that are worth
> using, but I wonder if the PWG wants to take it as a whole.

I don't think we have the situation where the PWG can "pick and choose"
things from CPAP and stuff them into the current HTTP-based approach.

The current HTTP-based approach and CPAP have these critical similarities:

* Both use simple named strings to convey parameters, attributes, etc.

* Both employ a framing mechanism that can allow for arbitrarily long
messages

* Both allow for arbitrary extensibility thru the addition of attributes
included with the request and/or response messages

Some last thoughts I'd like to share with everyone:

* CPAP is not perfect. It can use some improvements here and there,
but the essential framework is very, very solid. Above all, it's
emminently extensible, so no one should feel like their getting locked
into a given set of capabilities.

* I would not FOR A SECOND expect the IPP to accept CPAP as it is;
besides adding some missing pieces (which should take about 5 seconds),
I would hope the group would take CPAP and turn it into something of
its own (and change the name).

* CPAP has been field-tested on all platforms except the Macintosh for
about 10 years now. Underscore itself has ported CPAP implementations
to a dozen different Unix systems, and created a NetWare PSERVER NLM
from scratch around this protocol. It has been very thoroughly tested.

* In addition to handling print job submission tasks, the protocol also
defines other critical areas the printer industry sorely needs, including
comprehensive job accounting, event logging, surrogate client support
(in which the printer can use an external network host for such things
as on-the-fly downloading of fonts, forms and configuration data). In
fact, an entire file transfer mechanism is defined to allow the printer
to use a network host at will for storage and retrieval...simply.

One final note for those more intent on marketing rather than technology:

While it's true that Digital developed this protocol (by Brian Reid and
Chris Kent, at the DEC Western Research Labs), Digital is in effect no
longer using the protocol. CPAP was only implemented on Digital's
PrintServer family of network printers; this product family was based
on a version of the VAX chip set that is no longer in production.

Digital no longer manufactures PrintServers...and can't, since the
underlying hardware is no longer in production. Furthermore, it does
not appear that Digital intends to port the PrintServer architecture to
another printer platform.

In other words, should the PWG utilize CPAP in IPP, no one in the printer
industry should worry about not having a "level playing field" here.

...jay

----------------------------------------------------------------------
-- JK Martin | Email: jkm@underscore.com --
-- Underscore, Inc. | Voice: (603) 889-7000 --
-- 41C Sagamore Park Road | Fax: (603) 889-2699 --
-- Hudson, NH 03051-4915 | Web: http://www.underscore.com --
----------------------------------------------------------------------