PMP Mail Archive: Re: PMP> Top 25 minus 4 conditions/alerts proposal

Re: PMP> Top 25 minus 4 conditions/alerts proposal

JK Martin (jkm@underscore.com)
Thu, 1 May 1997 22:08:36 -0400 (EDT)

All,

I was originally going to address Gail's comments step-by-step, but
the more I got into it, the more I felt that what we really need
right now is to step back for a moment and address THE BIG PICTURE.

Warning. This is a very long post, but hopefully it reads quickly.
Those involved with this issue are encouraged to take the time to read
this message, as it suggests we have made some mistakes in the last
PMP telecon.

Those not interested in this issue should press "Delete" now.

Regarding Gail's comments: I'm always in favor of developing a design
(or standard) such that ease-of-implementation is regarded as a
top-level goal. To that end, I really can't argue with your excellent
discourse on how the a printer implementation may be impacted in this
situation.

However...I am seldom in favor of "robbing Peter to pay Paul",
particularly when I represent the "Peter" in this scenario!
(Uh oh, I may live to regret that statement...)

You see, you're approach certainly does make it easier for *you*,
but what about *me*, the poor mgmt app developer?

And here is precisely where the PMP group needs to step back and once
again ask that age-old question (you know, the one Tom Hastings
dislikes so much ;-), namely:

What problem are we trying to solve here?

First, let me present our perspective, that is, from the the mgmt app
point of view.

Our product (PrintAlert) is designed to monitor an extremely wide
variety of network printers, where the primary function is to visually
notify the user (a systems administrator or operator) of problems that
arise in those network printers.

Our customers want us to tell them _what's wrong_ with the printer,
and not just that the printer _has a problem_ that must be addressed.

In other words, within the context of this specific "toner low"
scenario, the customer demands to see something like the following;
the color prefixes help to visualize the classic "traffic light"
metaphor:

RED: "Printer XYZ has stopped due to a toner low condition!"

and *not* something like:

RED: "Printer XYZ has a problem!"

Now, given Gail and Bob's position, if "toner low" stops the
printer--yet the specific alert describing "toner low" is tagged as a
non-critical alert--then we can only say something like:

YELLOW: "Printer XYZ is low on toner"
RED: "Printer XYZ is offline!"

This is unacceptable to our customers, as they demand something like:

RED: "Printer XYZ is now offline due to low toner!"

Of course, what they really, Really, REALLY want to see is something
like:

RED: "Printer XYZ is now offline due to low toner!"
"You must press the `Continue' button on the front"
"panel to resume normal operation."

But I digress... ;-)

The point here is this: it seems we have backed ourselves into a more
complex situation whereby two (or more!) alerts are instrinsically tied
to each in terms of resolving the overall semantics of the problem
condition.

How can the mgmt app (using an algorithmic approach) know that the
non-critical "toner low" condition is intrinsically tied to the
critical "offline" alert? Oh yeah, and what order are the two alerts
supposed to be inserted in the Alert Table?

This is NOT GOOD. Unfortunately, it gets worse...

First, it should be noted that the "offline" alert itself is a new
addition to the MIB draft; this alert is not defined in RFC 1759.

I'm not sure we completely thought through the ramifications when
the group reached consensus on the issue of adding "offline" alerts
when any one of a large number of top conditions is encountered.

Recall that during the telecon we ran through Bob's great markup of
Chuck's latest "Top 25 Conditions" draft. Right up front we discussed
the "offline" issues; here is the extract from Harry's minutes posted
to the list the same day as the telecon on April 29:

> - Big discussion about how to define Offline. As usual, we did not
> reach consensus on how to DEFINE off-line, but we did agree that
> off-line should be treated as a separate condition. We decided
> that an off-line alert table entry should reference GroupIndex 5
> (the General Group) with subUnitIndex (-1).

The phrase "off-line should be treated as a separate condition" was in
reference to Bob's posted comments (4/29), which it was suggested that
a critical "offline" alert be added to the table whenever one of 16 (!)
different conditions were encountered (out of a total of 21 conditions
in the entire list).

For those printers that go offline at the drop of a hat, this means
we're going to deal with a BUNCH of "offline" alerts in the table.

And now we really have a mess...a potential total of 16 different
ways required to derive the actual condition.

How can a mgmt app deal with this binding problem of related alerts?

Another question: How can a mgmt app detect when the printer has been
taken offline because a user pressed a front panel button?

Don't you think that customers will want to know the difference?

Yet how can the mgmt app really tell in an algorithmic manner?

Here's another scenario to consider:

Say the cover is opened and the printer (by design, or by configuration)
automatically goes offline; before closing the cover, the user first
presses the "Offline" button on the front panel.

Now, how many "offline" alerts should be in the alert table BEFORE
the door is closed versus AFTER it is closed? Does the printer
remain offline after the door is closed, and remains that way until
the user presses a button to return the printer to an online state?

Again, how can we determine the difference so as to convey the
true nature of the condition in the printer?

This is now the problem we, the mgmt app developers, must face.
Hopefully the response from the PMP isn't something like:

"You'll have to know the precise model to which you're
communicating, then use the proper mapping table.
You can determine the model by using the OID maintained
in the hrDeviceID variable...and hope that the variable
has been implemented by the vendor..."

(Note: most vendors do not currently implement hrDeviceID,
at least not in the proper manner.)

Hardly a standard approach. Should the mgmt app accept this burden?

Here's a proposal to fix that problem that results in the reversal
of consensus reached in the last telecon:

If the printer automatically goes offline when one of
the 16 defined top conditions is encountered, then the
hrPrinterDetectedErrorState variable is set to "offline".
In this case, no "offline" alert should be added to the
Alert Table.

When a condition occurs that causes the printer to automatically
go offline, a critical alert is added to the Alert Table, and
the alert code reflects the specific nature of the condition.

The only time an "offline" alert is added to the Alert Table
is when the printer is directed to go offline by management
control, either remotely or locally, from the front panel.

The points to consider:

- The MIB agent must already set hrPrinterDetectedErrorState; this
rule does not change the definition in RFC 1759, so we should have
no backward-compatibility problems.

- We can now more accurately detect when the front panel is used
to effect the offline condtion.

- There will be times when certain conditions (eg, "toner low") must
be treated as either critical or non-critical based on the current
printer configuration, if the printer allows such configuration.
(And several excellent models currently provide this feature.)

- Bonus points: The Alert Table remains smaller.

Now this last point should have a few folks interested... ;-)

The issue this approach raises revolves around where we started on
this whole thread, namely:

If the printer automatically goes offline due to a particular
condition, then the alert added to the table should have a
severity level indicating that it is a critical alert.

This means that "toner low" (and other) conditions must be intelligently
handled by some printer implementation. And this situation is what I
believe Gail describes as something she prefers to avoid. (Same with Bob?)

Again, a fundamental assumption here is that the definition of "offline"
implies the printer has ceased its normal operations, which by definition,
is a "critical" problem. (Note that some implementations use "offline"
to denote they are not currently accepting print requests, but printing
could be in progress with no problems present.)

So there it is. Sorry for the length, but I wanted to ensure that
everyone saw the complete ramifications of this situation.

If somehow I'm being stupid here, please be kind. But do please point
out the errors in this analysis as quickly as possible. The MIB clock
is ticking...

...jay

----------------------------------------------------------------------
-- JK Martin | Email: jkm@underscore.com --
-- Underscore, Inc. | Voice: (603) 889-7000 --
-- 41C Sagamore Park Road | Fax: (603) 889-2699 --
-- Hudson, NH 03051-4915 | Web: http://www.underscore.com --
----------------------------------------------------------------------