Method for Managing, Controlling, and Serializing Concurrent Maintenance Procedures on a Computer System
Publication Date: 2010-Jun-03
The IP.com Prior Art Database
Disclosed is a method for controlling complex concurrent maintenance procedures on computer systems through the use of service sessions (an abstract collection of state information about the currently active concurrent maintenance procedures). Through the use of these service sessions, the computer system can ensure that the maintenance operation occurs in an ordered and controlled fashion, that simultaneous maintenance operations do not interfere with each other, and that, if necessary, each operation could be aborted if problems occur and the system returned to its previous state.
Ȉ ˇ Ȉ ˇ
The disclosed method utilizes abstract service sessions to track concurrent maintenance procedures currently in progress. A given service session is started by the systems management interface/utility at the beginning of the procedure and ended by the interface/utility when the procedure ends. The service session includes things like a token that identifies the client (i.e., systems management interface/utility) that established the session, a token identifying the type and location of the hardware part on which the procedure is being performed, a token identifying the type of procedure (i.e., add, remove, repair) being performed, and a list of actions that must be performed if the procedure is canceled or fails.
To manage the service sessions that are currently being used, the system also utilizes an abstraction referred to as a service session manager that tracks service sessions that are currently in existence, enforces concurrent maintenance procedure serialization rules when clients attempt to establish new service sessions, and ensures a proper service session is established as each of the various interactions/exchanges associated with a concurrent maintenance procedure are received by the platform firmware. If any service session encounters a critical error (one which leaves the system in an indeterminate state, from which no recovery is possible without restarting the system) during its processing, it reports it to the service session manager, and it is this construct that prevents any new service sessions from starting when the system is in the critical error state.
Figure 1 shows a typical partitioned system with three partitions and two Hardware Management Consoles (HMC's). Each HMC and each partition have platform management utilities through which users can perform hardware concurrent maintenance of various forms. Figure 1 depicts a user performing concurrent maintenance from HMC1 and from partition 2. It also shows three constructs in the hypervisor (a layer of platform firmware involved in platform resource management, system serviceability, logical partitioning, and hardware virtualization). The constructs are a service session manager and two service sessions. There is one service session for each platform management utility from which concurrent maintenance is being performed.
The service session manager is responsible for managing all the service sessions currently in existence. A service session is a representation of a platform management utility that is performing some type of hardware concurrent maintenance. The information maintained by the service session includes: Client ID: An identifier that represents a particular platform management client.
being performed, such as add, repair, upgrade, etc.
Domain Type: A value that represents the type of service domain on which concurrent
maintenance is being performed. Values might include things such as PCI adapter, I/O Expansion Un...