Browse Prior Art Database

Dynamic Determination and Confirmation Methodology for Environmental Failures

IP.com Disclosure Number: IPCOM000122902D
Original Publication Date: 1998-Jan-01
Included in the Prior Art Database: 2005-Apr-04
Document File: 4 page(s) / 182K

Publishing Venue

IBM

Related People

Hamilton II, RA: AUTHOR [+4]

Abstract

Disclosed is a method for an operating system to dynamically determine the validity of critical error conditions reported by a monitoring process.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 31% of the total text.

Dynamic Determination and Confirmation Methodology for Environmental
Failures

      Disclosed is a method for an operating system to dynamically
determine the validity of critical error conditions reported by a
monitoring process.

      Note:  For the purposes of this description, the term "primary
process" refers to any critical process subject to shutdown due to
critical environmental conditions.  This would usually pertain to a
computer Operating System (OS), though the definition is not
explicitly limited to the OS.  The term "monitoring process" refers
to another process, either running concurrently on the same
processor, or running  asynchronously on a separate processor, tasked
with monitoring environmental conditions which may affect operation
of the primary process.  "Environmental conditions" are defined as
anything within the  operating environment which affects operation of
the processor and, by  extension, the primary process.  Examples of
such conditions may include  system voltage excursions, temperature
variations, or loss of critical  hardware peripherals necessary for
continued operation.

      In today's computer systems, monitoring processes are often
provided to survey various physical parameters for failure
conditions.  The implementation of these monitoring systems provides
warning of any condition, such as those outlined above, which might
endanger the data integrity of the primary process and related
applications.  In the event that such an error condition
materializes, the monitoring process notifies the primary process, so
that it might take steps to minimize data corruption and loss in the
event of a failure.  An exemplary embodiment of such a processor
warning system can be seen in a mechanism by which the monitoring
process writes a two-byte code, or series of two-byte codes, to the
primary process upon detection of environmental errors.  Following
receipt of these EPOW (as they are known in this embodiment) codes,
the OS (the primary process) makes a decision whether to shut down in
order to minimize the data corruption and loss associated with an
unorderly physical failure.

      A fundamental inconvenience with such a monitoring and
reporting mechanism is that, even though a system lock-up or
unorderly power fail is avoided by the process-initiated shutdown,
the subsequent  power loss may still be a sudden and disruptive
event.  For example, in  the IBM* RS/6000* computer system, a locked
fan rotor will result in near-immediate shutdown with little or no
warning to the user.  If a user does not have time to save his or her
data, any unsaved work is lost.  This can be especially problematic
with multi-user server configurations; as examples, if a large
reservations system or popular  website experienced an unnecessary
shutdown, direct and consequential business ramifications could
result.  Thus, it is of extreme importance  that such shutdowns not
transpire without due...