I think reading AGENT_STATE register (may be periodically) is very
similar to heart-beat, isn't it?
Because the error is (accidentally) occurred during the
communication between only two nodes, I think bus reset may be too
sacrificed for entire bus as a first recovery step.
Am I missing something?
On Thu, 6 Aug 1998 09:45:09 -0700 (PDT)
Greg Shue <firstname.lastname@example.org> wrote:
> Fumio Nagasaka wrote:
> > > In my idea, a print spooler in the initiator shall detect time out.
> > OK, forget my brain damaged idea. But still we don$B!G(Bt have cool answers.
> > Any session layer beyond SBP-2 can detect this error. Then what shall we do?
> > Examining ORB_POINTR requires an initiator to do effort as same as
> > issuing a transport command. Doesn$B!G(Bt it?
> I wouldn't call it "brain damaged", just focused on a narrow
> application space. :-)
> Issuing a transport command is much more intrusive to the system
> than examining the AGENT_STATE and ORB_POINTER registers, because
> it now requires it's own sequence number, and it can't be put on
> the queue until the condition is corrected!
> So, just for review:
> - Any identified errors other than a lost Ack on a status FIFO write
> either have a retry alogrithm described in SBP-2, or cause
> a status FIFO write to report the error.
> - All identified errors cause the Fetch Engine to transition to
> the DEAD state, and cause the entire task set to be cleared.
> - Because we are dealing with parallel-execution rather than
> just out-of-order, we must clarify what value ends up in the
> ORB_POINTER register when the Fetch Engine is in the DEAD state.
> ? The process used by an initiator to recover when it finds
> the Fetch Engine in the DEAD state without a cause written to
> the status FIFO is to construct a linked list of the pending
> ORBs and queue it up on the Task List.
> - Because the PWG Transport Requirements identify the transport as
> "reliable" and "Connection-oriented" and the undocumented transport
> model is TCP, the following options come to mind:
> 1) The stall detection could be left up to the Transport
> Client. Any clients which do not provide this detection
> (via a timeout) would get hung in this condition until
> something caused a Bus Reset.
> 2) The stall detection could be handled by the initiating
> node's PWG protocol layer. It could either be done by:
> - a timeout (negotiated reconnect interval?) on target
> activity, causing the initiator to look at the
> AGENT_STATE register.
> - the target's node generating a Bus Reset
> - ???
> I believe strongly that this detection should NOT be forced on
> the Transport Client or above. I also believe that human
> intervention must not be required to recover from this condition.
> Any target inactivity detection must occur within a
> target-dependent timeout, because different functions and
> technologies have different constraints. (e.g. ink jet dry
> times, scanner bulb color stability, media sitting over fusers.)
> Can somebody model or better yet measure the frequency of this error?
> If it really is extreemely low, then let's just let the target
> node generate a Bus Reset. Otherwise let's put in a heart-beat
> write to the Status FIFO using Unsolicited Status.
> I think we need to choose one at the next meeting and be done
> with it. We're only looking for a solution for the first
> revision of this protocol.
> Greg Shue
> Hewlett-Packard Company
> All-in-One Division email@example.com
-- Akihiro Shimura (firstname.lastname@example.org) Office Imaging System Promotion Project CANON INC.