Browse Prior Art Database

Threshold Monitor Function for Alarm Triggering

IP.com Disclosure Number: IPCOM000108545D
Original Publication Date: 1992-Jun-01
Included in the Prior Art Database: 2005-Mar-22
Document File: 2 page(s) / 77K

Publishing Venue

IBM

Related People

Schwendemann, W: AUTHOR [+2]

Abstract

Internal software errors can be broken into two categories: recoverable and unrecoverable. The unrecoverable errors cause the "system" to fail unconditionally, whereas recoverable errors are not life threatening. Two commonly encountered recoverable errors are insufficient storage and lock contention errors. These errors usually increase with system load, and a certain number of these errors can be expected for a given system load. The system load is defined as the average number of requests for a service over a given time interval. Since these errors are a function of system load, a "monitoring function" can be used to detect if the recoverable errors are exceeding some threshold. This disclosure describes such a function.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Threshold Monitor Function for Alarm Triggering

       Internal software errors can be broken into two
categories: recoverable and unrecoverable. The unrecoverable errors
cause the "system" to fail unconditionally, whereas recoverable
errors are not life threatening.  Two commonly encountered
recoverable errors are insufficient storage and lock contention
errors. These errors usually increase with system load, and a certain
number of these errors can be expected for a given system load.  The
system load is defined as the average number of requests for a
service over a given time interval. Since these errors are a function
of system load, a "monitoring function" can be used to detect if the
recoverable errors are exceeding some threshold.  This disclosure
describes such a function.

      All software products (e.g., database, communications,
operating systems) usually separate errors according to severity.
Those that the software cannot deal with (e.g., linked list being
destroyed) will cause the system to shut down.  This is an
appropriate action to take, since the integrity of the system has
been compromised.  There is another class of errors that is severe to
an application, but does not necessarily mean that the system needs
to be brought down.  These errors usually occur because some resource
limit has been encountered (e.g., memory, lock contention, etc.).
For a given system load a certain number of these errors are
expected, and are not considered unusual.  However, an indication of
a system going awry occurs when the number of errors exceeds this
"threshold". When this happens, some form of intervention is
required.

      All systems can only accommodate a finite number of requests.
As the number of requests approaches system capacity, the error rate
will grow exponentially.  If...