Browse Prior Art Database

Fault Injection Mechanism

IP.com Disclosure Number: IPCOM000123394D
Original Publication Date: 1998-Oct-01
Included in the Prior Art Database: 2005-Apr-04
Document File: 2 page(s) / 67K

Publishing Venue

IBM

Related People

Dixit, A: AUTHOR [+2]

Abstract

This disclosure relates to a fault injection method for testing distributed systems of servers in a networked computer system. The method allows the injection of a variety of fault states at points in the processing and is extensible to allow the definition of new fault states.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Fault Injection Mechanism

   This disclosure relates to a fault injection method for
testing distributed systems of servers in a networked computer
system.  The method allows the injection of a variety of fault
states at points in the processing and is extensible to allow the
definition of new fault states.

   In the process of testing a distributed system of servers,
it is often desirable to bring the system to a state which the
system could reach owing to sources outside the control of the
system.  These could be network delays, power failures, and the
like.  There are also cases such as an inordinate amount of time
being consumed by a certain piece of code in a multithreaded code,
thus starving other threads in the process.

   The proposed mechanism is to bring the system to a state
which could potentially be reached during the use of the system, for
example, if the server stopped execution when a particular statement
was executing, or if a particular call from one server to another
took a given amount more time than it normally would, or if a certain
IO to the disk took a given amount of time more than its routine time
consumption, or if a certain exception was received by the code which
was not designed to handle the exception.  It would be desirable to
generate all these cases to observe the ability of the system to
recover from it.  It would also be desirable to test how the server
would behave when it stopped executing a certain statement and was
restarted, or how other servers would behave when one of them stopped
execution suddenly in the midst of its communication with another
server.  It would be desirable to test how the other server would
recover from this and how the system could get back into a stable
state.

   The proposed design is extensible to additional types of
test.  It has th...