Browse Prior Art Database

Self-Checking Software

IP.com Disclosure Number: IPCOM000111885D
Original Publication Date: 1994-Apr-01
Included in the Prior Art Database: 2005-Mar-26
Document File: 2 page(s) / 108K

Publishing Venue

IBM

Related People

Buckley, W: AUTHOR

Abstract

Disclosed is a scheme employing principles of hardware redundancy in a software setting. High system reliability is provided in complex environments where smaller code-element reliability is lower than needed. Three software versions of hardware redundancies are featured:

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 49% of the total text.

Self-Checking Software

      Disclosed is a scheme employing principles of hardware
redundancy in a software setting.  High system reliability is
provided in complex environments where smaller code-element
reliability is lower than needed.  Three software versions of
hardware redundancies are featured:

1.  proves replacement code is equivalent to original;

2.  separately developed modules in critical areas allow opportunity
    to select "best" implementation and finally

3.  provide higher robust systems with fault detection and isolation
    without system failure.

      Experience with long evolved complex code has identified
modules that suffered more than an average number of APARs from the
field.  Continuous repairing of such modules has produced code that
is complicated and difficult to maintain.  To rewrite such modules is
desirable if it is possible to guarantee that the new code won't be
more prone to error than the old.

      In hardware design and development, greater reliability can be
built into a product through redundancy.  A space application which
relies on high reliability in its systems, uses a "voting" scheme in
which output from three computers is compared.  If one of the three
is malfunctioning the consequences are minimised since it is in the
minority of the "vote".  If it continues to malfunction, recognised
through continuous voting losses, it can be replaced with a fourth,
standby computer.  Thus a highly reliable system is devised, not from
an attempt to perfect a single hardware component, but by a regime
that mimimises the effects of any imperfections.

      The self checking software approach is similar to that of
"voting" systems described above.  The difference is that it is the
module, not the computer, which is being checked.  Only two modules
are needed, one leader and one alternate.  For existing problem
modules, or for modules in new software identified as being crucial,
an alternate module is written by a programmer other than the
programmer who wrote the original.  The important requirement is that
both modules would have identical interfaces which must be complete
and definitive.  They must include all data areas, general registers,
input areas, indeed all information that the module uses to order its
behaviour, and all data that the module has the capacity to change.
There are products available on the market which can identify the
interfaces from the source code of existing modules.

      The next step is to write a test bed that would capture all
this data before and after a module is called, hand over a copy of
the "before image" to the alternate module, run the alternate module
and compare "after images".  Should a discrepancy show up in this
comparison, the lead module's output is used, and the discrepancy is
reported.

Benefits of the new method follow:

1.  Replacement of a problem leader module can be checked against an
    alternate module in s...