Browse Prior Art Database

Failure Prevention And Error Repair for Overlays On Code And Data

IP.com Disclosure Number: IPCOM000099371D
Original Publication Date: 1990-Jan-01
Included in the Prior Art Database: 2005-Mar-14
Document File: 5 page(s) / 228K

Publishing Venue

IBM

Related People

Bowen, NS: AUTHOR [+2]

Abstract

The key to avoiding failures from faults due to software overlays is to detect and correct the overlays before the corrupted information is used. This provides an increase in MTBF through dynamic correction. Experimental studies have indicated that there exists a very large error latency for a significant fraction of overlays. The large error latency provides a window, ranging from tens of minutes to hours, to detect and correct such errors. Given the large window, it is crucial to correct before multiple errors accumulate; however, inexpensive techniques such as periodic testing can be used.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 31% of the total text.

Failure Prevention And Error Repair for Overlays On Code And Data

       The key to avoiding failures from faults due to software
overlays is to detect and correct the overlays before the corrupted
information is used.  This provides an increase in MTBF through
dynamic correction.  Experimental studies have indicated that there
exists a very large error latency for a significant fraction of
overlays.  The large error latency provides a window, ranging from
tens of minutes to hours, to detect and correct such errors.  Given
the large window, it is crucial to correct before multiple errors
accumulate; however, inexpensive techniques such as periodic testing
can be used.

      The applicability of periodic testing, on code or data, is
largely dependent on its usage characteristics.  A static section of
storage (no write operations) lends itself well to comparison
testing, whereas dynamic does not.  However, in practice, sections of
storage that are dynamic (have write operations) do not necessarily
change all the time. There can exist regions that do not change for
substantial periods of time, providing an opportunity for periodic
testing and correction.

      Figure 1 shows a hypothetical usage pattern of different
regions in virtual storage containing either code or data.  The
horizontal axis represents time, and the vertical axis shows
different regions.  Each horizontal bar in the graph identifies a
period when the region is being frequently updated.  Thus, during
this time the storage in that region is said to be in a non-static
phase.  The blank periods between the horizontal bars are when there
are no changes made to the region.  When the blanks between the
horizontal bars are reasonably large we call this portion a static
phase.  It is conjectured that the real usage of a region alternates
between static and non-static phases. During a static phase there are
no write operations on a region, but there may be any number of read
operations.  We assume that only the owner of the region performs
write operations on it.

      Regions in a static phase are good candidates for protection
using periodic testing.  In Figure 1, such regions are a, b, c and d;
region e is not.  For the purpose of this disclosure, we classify
storage regions into three different categories depending on their
behavior.  These are:

      1. Regions of code and data that are in a static phase
throughout
    their existence, e.g., region c.
    2. Regions of code and data that dynamically enter static phase
and

      non-static phase, the static phase being of a reasonable length
of

      time to implement periodic testing; e.g., regions a, b and d.

      3.  Regions of code that are constantly in the non-static
phase, e.g.,
    region e.
    Periodic testing is applicable to categories 1 and 2 but not 3.
Storage in category 1 is best protected by placing it in read-only
virtual storage to take advanta...