P1394 Mail Archive: Re: P1394> Revised PWG1394 Cmd Set

Re: P1394> Revised PWG1394 Cmd Set

Greg Shue (gregs@sdd.hp.com)
Thu, 6 Aug 1998 09:45:09 -0700 (PDT)

Fumio Nagasaka wrote:
> > In my idea, a print spooler in the initiator shall detect time out.
>
> OK, forget my brain damaged idea. But still we don’t have cool answers.
> Any session layer beyond SBP-2 can detect this error. Then what shall we do?
> Examining ORB_POINTR requires an initiator to do effort as same as
> issuing a transport command. Doesn’t it?

I wouldn't call it "brain damaged", just focused on a narrow
application space. :-)

Issuing a transport command is much more intrusive to the system
than examining the AGENT_STATE and ORB_POINTER registers, because
it now requires it's own sequence number, and it can't be put on
the queue until the condition is corrected!

So, just for review:

- Any identified errors other than a lost Ack on a status FIFO write
either have a retry alogrithm described in SBP-2, or cause
a status FIFO write to report the error.

- All identified errors cause the Fetch Engine to transition to
the DEAD state, and cause the entire task set to be cleared.

- Because we are dealing with parallel-execution rather than
just out-of-order, we must clarify what value ends up in the
ORB_POINTER register when the Fetch Engine is in the DEAD state.

? The process used by an initiator to recover when it finds
the Fetch Engine in the DEAD state without a cause written to
the status FIFO is to construct a linked list of the pending
ORBs and queue it up on the Task List.

- Because the PWG Transport Requirements identify the transport as
"reliable" and "Connection-oriented" and the undocumented transport
model is TCP, the following options come to mind:

1) The stall detection could be left up to the Transport
Client. Any clients which do not provide this detection
(via a timeout) would get hung in this condition until
something caused a Bus Reset.

2) The stall detection could be handled by the initiating
node's PWG protocol layer. It could either be done by:
- a timeout (negotiated reconnect interval?) on target
activity, causing the initiator to look at the
AGENT_STATE register.
- the target's node generating a Bus Reset
- ???

I believe strongly that this detection should NOT be forced on
the Transport Client or above. I also believe that human
intervention must not be required to recover from this condition.
Any target inactivity detection must occur within a
target-dependent timeout, because different functions and
technologies have different constraints. (e.g. ink jet dry
times, scanner bulb color stability, media sitting over fusers.)

Can somebody model or better yet measure the frequency of this error?
If it really is extreemely low, then let's just let the target
node generate a Bus Reset. Otherwise let's put in a heart-beat
write to the Status FIFO using Unsolicited Status.

I think we need to choose one at the next meeting and be done
with it. We're only looking for a solution for the first
revision of this protocol.

-- 
Greg Shue
Hewlett-Packard Company
All-in-One Division			        gregs@sdd.hp.com
----------------------------------------------------------------