Browse Prior Art Database

ADDING FAULT TOLERANCE TO A MANAGEMENT SYSTEM

IP.com Disclosure Number: IPCOM000016798D
Original Publication Date: 2003-Jul-16
Included in the Prior Art Database: 2003-Jul-16
Document File: 2 page(s) / 33K

Publishing Venue

IBM

Abstract

The following invention stores both the hierarchical structure of the management system and the remote references to the distributed management objects in a relational database. The method of storage is such that when a disaster occurs, the recovery does not affect the performance of the system, and there is certainly no need to restart either the entire system or the management system.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

ADDING FAULT TOLERANCE TO A MANAGEMENT SYSTEM

Problem

    A management system includes a "representative"
management object in each of the processing nodes, and an
independent management node that oversees the running of the
distributed system and serves as a management interface to the
outside world. When considering disaster recovery, there are a
few possible scenarios:

    1. The "representative" management object failed within a
processing node.

2. The entire processing node failed.
3. The management node failed.
In the first scenario, the processing node where the
management object failed is crippled since it cannot report
statistics to or accept instructions from the management
object.

    In the second scenario, the node must be restarted along
with its management object, but the managerial hierarchy is
lost.

    In the third scenario, all managerial aspects of the
system are crippled, and all remote references from the
management node to the management objects in the processing
nodes are lost as well.

    Unless the hierarchical structure of the management
system and the remote references to the distributed management
objects are restorable, the best-case scenario of the system
recovery requires a complete restart of the management system.
A worst-case scenario could require a complete restart of the
entire distributed system, which can affect the system's
performance.

Solution

    The following invention stores both the hierarchical
structure of the management system and the remote references
to the distributed management objects in a relational
database. The method of storage is such that when a disaster
occurs, the recovery does not affect the performance of the
system, and there is certainly no need to restart either the
entire system or the management system.

    When a management object is created, one of its first
steps is to register with the "LOD" - the Living Object
Database. Upon registration, an entry for the object is
created in a relational database, which includes the object's
logical name, a manager object responsible for managing the
managment object and its remote reference.

    During the course of its life, the management...