Browse Prior Art Database

Duplexed Surveillance Strategy for Improved System State Reporting

IP.com Disclosure Number: IPCOM000118899D
Original Publication Date: 1997-Sep-01
Included in the Prior Art Database: 2005-Apr-01
Document File: 4 page(s) / 122K

Publishing Venue

IBM

Related People

Hamilton, RA: AUTHOR [+4]

Abstract

Disclosed is a method of duplexing interprocessor surveillance, allowing increased problem reporting and heightened system reliability, availability, and serviceability.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 45% of the total text.

Duplexed Surveillance Strategy for Improved System State Reporting

      Disclosed is a method of duplexing interprocessor surveillance,
allowing increased problem reporting and heightened system
reliability, availability, and serviceability.

      In this disclosure, the term "primary process" refers to a
process which is generally required for proper operation of a
computer system.  This would typically be closely tied to the
Operating System (OS), such that it would run when the OS runs and
stop when the OS stops.  However, the concept may be abstracted to
define any process running on any processor without loss of
application as defined here.  The term "monitoring process" refers to
a process which has the  job of monitoring the well-being of the
primary process.  Note that these  two processes may be concurrent on
the same processor, subject to limitations imposed by the possible
failure of the primary process, OR they could run asynchronously on
separate processors.  In this latter  case, the monitoring and
primary processes must share some form of virtual memory space or
communications link, such that a surveillance heartbeat may be
exchanged.  Discussion of media for the heartbeat exchange, however,
is beyond the scope of this disclosure.

      In today's computer environment, customers are demanding an
increasing number of features to increase the reliability and
serviceability of their machines.  A concept which has been used over
the past few years to increase these attributes within computer
systems is that of surveillance.  Surveillance typically entails some
primary process sending a stream of signals to a monitoring process,
the purpose of which is to ascertain that the former is still
functioning.  The monitoring process will typically take some action,
such as setting an alarm, displaying failure codes, or contacting the
user via electronic mail or telephone in such event that the primary
process ceases its stream of heartbeats.  Such is the nature of
surveillance as it has been classically known:  a simple "good/bad"
decision based upon the exchange of heartbeat messages.

      The question which may be asked as surveillance becomes more
pervasive deals with leveraging maximum benefit from the monitoring
process.  Frequently, the monitoring process undertakes other actions
besides that as watchdog for the primary process.  If the primary
process, or primary processes--each of which is subject to
surveillance by the monitoring process--could obtain knowledge of
other error conditions that may exist, the primary process(es) might
take advantage of such data in ways that the monitoring process is
unable, performing new services for the user in the meantime.

      The solution to this implied underutilization problem lies in
maximizing the information content of the heartbeat message exchange.
See Fig. 1 for an illustration of the classical surveillance
approach.  The heartbeat has typic...