Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Recovery from Single Critical Hardware Resource Unavailability

IP.com Disclosure Number: IPCOM000105661D
Original Publication Date: 1993-Aug-01
Included in the Prior Art Database: 2005-Mar-20
Document File: 6 page(s) / 140K

Publishing Venue

IBM

Related People

Greenstein, PG: AUTHOR [+2]

Abstract

Disclosed is a mechanism for avoiding computer system(s) IPL (initial program load) after single critical hardware resource unavailability in BASIC or Logically Partitioned (LPAR) mode.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 22% of the total text.

Recovery from Single Critical Hardware Resource Unavailability

      Disclosed is a mechanism for avoiding computer system(s) IPL
(initial program load) after single critical hardware resource
unavailability in BASIC or Logically Partitioned (LPAR) mode.

      Today, if a single critical hardware resource becomes
unavailable, the only way to recover from this unavailability is to
re-IPL the system(s) once the repair including POR (Power-On Reset)
or IML (Initial Microcode Load) is performed.

Examples of a single critical resource are:

o   CPU (central processor unit) of a uniprocessor (UP) machine,

o   single CPU in a physical partition of a physically-partitioned
    multiprocessor (MP) machine,

o   CPU of a logical partition with a single dedicated CPU,

o   the storage element (SE) of a one-storage-element machine,

o   a storage element of a two-or-more-storage-element machine where
    storage elements cannot be dynamically taken offline.

The reasons for resource unavailability include:

1.  Planned maintenance,

2.  Unplanned maintenance

    a.  as a result of resource failure in the cases where the
        resource status could be preserved,

    b.  a power warning situation where backup power can maintain the
        system for a period of time.

There are two sets of scenarios:

1.  Expected Single Critical Hardware Resource Unavailability

2.  Unexpected Single Critical Hardware Resource Unavailability

      An example of expected single critical hardware resource
unavailability is removal of a resource for planned maintenance
(repair, replacement, upgrade).

      The first step is to prevent any use of the resource.  For
example, to perform maintenance on a CPU, it is sufficient to stop
that CPU from processing instructions (Fig.  1).  To perform
maintenance on a storage element, it is necessary to both stop the
CPU from accessing that element and quiesce I/O (Fig. 2).  The second
step involves recording contents and state of the resource on a
medium that is going to survive the maintenance procedure.  For
example, CPU information may be kept in the hardware storage area
(HSA), and storage element information may be kept in expanded
storage (ESTOR).  Note that to free ESTOR it may be necessary to
reclaim it from use by having software perform dynamic
reconfiguration of ESTOR.  Following these steps, the
maintenance/replacement of the resource is performed.  The
maintenance success is validated by a POR or IML of the target
resource.  Once the maintenance is complete, the recorded resource
contents and state are restored, and software is given control.

Examples of unexpected single critical hardware resource
unavailability are:

o   CPU check-stop with logout.

o   Imminent power failure (switch to temporary, short-lasting
    alternate power source).

      In the case of an unexpected single critical hardware resource
unavailability, the appropriate contents...