Restarting jobs due to failed transmissions can get to be a real rats
nest. Just so everyone knows that... ;-)
Restarting after failure can be easy or hard, depending on your criteria.
I hope we don't go hog-wild here, either.
Exactly where the client and server "intelligently" resume where they
left off depends on a few factors. The first that pops into mind is
whether the documents sent (successfully) before the failure have already
been printed or not (and which of those documents have been printed, etc).
We have seen several cases in which the customer preferred to print the
document set in a contiguous manner on the printer. That is, even though
some of the documents were printed prior to the failure, the customer
would rather print all over from the beginning. (This is actually the
standard BSD UNIX method of handling restarts, by the way.)
Some reasonable things can be done in the area of recovery, but I'd just
like to warn folks that it is often not as easy as it looks.
Feature-rich, configurable checkpointing/recovery could very well be
something we defer to a future release of IPP.
Just my pennies worth.
----- Begin Included Message -----
Date: Fri, 9 May 1997 20:38:27 -0700
From: Robert.Herriot at Eng.Sun.COM (Robert Herriot)
To: Robert.Herriot at Eng.Sun.COM, jkm at underscore.com
Subject: Re: IPP>PRO another reason for needing byterange and document number
Cc: ipp at pwg.org
I don't see this as checkpointing. I would assume that the client retains
the data until it receives a response from the printer indicating that
it has received the data. Likewise the server would know the "high water"
mark for a file and ignore data below it.
The question is whether the client should retransmit the failed
operation or figure out how to start the sequence all over again. I am
suggesting the NFS and Web NFS algorithms when I suggest retransmitting
the failed operation.
Do you really think it is easier for both client and server to restart
the sequence of operations?
> From jkm at underscore.com Fri May 9 20:12:03 1997
> Date: Fri, 9 May 1997 23:10:07 -0400 (EDT)
> From: JK Martin <jkm at underscore.com>
> To: Robert.Herriot at Eng> Subject: Re: IPP>PRO another reason for needing byterange and document number
> Cc: ipp at pwg.org> X-Sun-Charset: US-ASCII
> Content-Length: 2240
> X-Lines: 57
>> Is IPP expected to support checkpointing to the point where the
> client will resume submission of document at *precisely* the point
> where the transmission failed?
>> I wouldn't think so. Rather, the client would resume transmission
> at the *start* of the document that failed.
>> Is this true?
>> ----- Begin Included Message -----
>> From ipp-owner at pwg.org Fri May 9 23:07 EDT 1997
> Date: Fri, 9 May 1997 20:04:16 -0700
> From: Robert.Herriot at Eng.Sun.COM (Robert Herriot)
> To: ipp at pwg.org> Subject: IPP>PRO another reason for needing byterange and document number
>> After reading through the WebNFS document and spotting the following paragraphs,
> I think we need byteranges and document numbers.
>> 10.0 Timeout and Retransmission
>> A WebNFS client should follow the example of conventional NFS clients
> and handle server or network outages gracefully. If a reply is not
> received within a given timeout, the client should retransmit the
> request with its original XID (described in Section 8 of RFC 1831).
> The XID can be used by the server to detect duplicate requests and
> avoid unnecessary work.
>> While it would seem that retransmission over a TCP connection is
> unnecessary (since TCP is responsible for detecting and
> retransmitting lost data), at the RPC layer retransmission is still
> required for recovery from a lost TCP connection, perhaps due to a
> server crash or, because of resource limitations, the server has
> closed the connection. When the TCP connection is lost, the client
> must re-establish the connection and retransmit pending requests.
>>> It seems to me that this same issue exists with regard to IPP operations
> including those that send document data. Thus a client may send a block
> of document data where the transmission succeeds at the TCP level, but
> the server crashes before sending a "data received and processed"
> reponse. In such a case, the server may or may not have processed the
> data. Since the client will have to retransmit the data, the server
> needs to know whether it is new data or another copy of the last data,
> thus the need for byte ranges and document numbers to identify the
> piece of data.
>> Bob Herriot
>>> ----- End Included Message -----
----- End Included Message -----