Method for synchronizing domain membership and architecture changes following isolation from, and reconnection to an SSI cluster.
Publication Date: 2011-May-06
The IP.com Prior Art Database
This disclosure addresses the means to accommodate unscheduled events that affect the integrity of cluster communications, in particular events that occur during a configuration change that might otherwise leave the members of a relocation domain in an inconsistent state with respect to other members of a domain.
Page 01 of 4
Method for synchronizing domain membership and architecture changes following isolation from, and reconnection to an SSI cluster .
Live Guest Relocation (LGR) is the ability to dynamically move a VM guest from one VM instance to another without the need for shutting down the guest or its applications. In order for Live Guest Relocation to offer greatest ease of operational use and maximum flexibility in a heterogenous Single System Image cluster of systems, an artifice known as a Relocation Domain, is defined.
The Relocation Domain (RLD) specifies a subset of member systems from the SSI to which the system administrator will assign selected guest virtual machines. Once assigned to a RLD, the guest is presented with a consistent architecture, the maximal common subset (MCS) of the RLD members, no matter which member the guest is executing on. No administrator specification is required relating to architectural features; these being evaluated implicitly by the SSI management software.
For flexibility of use, it is desirable to allow a system administrator to redefine the constituent members of a relocation domain. Furthermore, computer systems that support hardware and firmware updates while allowing continuity of operation might result in a change to the system's architecture. To be supportive of this within a virtual machine, the Relocation Domain must also support dynamic architecture changes (both addition and removal of features). These capabilities are assumed.
Member systems of a cluster are connected though some form of networking capability. Through this, configuration and status information are exchanged. It is conceivable that a system might lose contact with the other members of a cluster without itself being impaired for normal operation. A problem arises when this occurs in that those members of the cluster still in contact with each other will not be able to determine whether the out-of-contact member has left permanently or temporarily. Are domain configuration changes to be permitted under this circumstance? If they are how will the cluster arrive at a consistent state when the out-of-contact member rejoins after a temporary absence? These are questions this publication discusses.
The problem of loss of connectivity is compounded further because a configuration change might have been initiated by the member system that loses contact with the cluster before other members become aware of the details of the change. A critical issue is therefore one of knowing which system has the most recent configuration information.
Issues of synchronization can be side-stepped by limiting the use of configuration definitions to those made statically for the life of a member system or by prohibiting all further dynamic changes at the first sign of cluster instability. Neither of these approaches is useful in a high-availability cluster environment.
This publication addresses the mechanisms necessary to permit maximum flexibility for...