Browse Prior Art Database

A novel technique for rapid recovery of shared resources Disclosure Number: IPCOM000013909D
Original Publication Date: 2001-Jul-15
Included in the Prior Art Database: 2003-Jun-19

Publishing Venue



Consider a resource manager controlling updates to both local and shared resources. Shared resources are those which are available via a multiplicity of resource managers (peer resource managers) with the intent of making the resources more available since an outage of one resource manager does not lead to an outage of the resource as it can be accessed via another peer resource manager. In effect, a different class of service is offered for shared resources over local resources and one difference in this class of service is that shared resources should be more available. In the event of a failure of the resource manager it is important to make the shared resources available to peer resource managers as rapidly as possible. However, it was not previously possible for peers to perform any operation on units of work which are INDOUBT with respect to either the log of the failed resource manager, or an external syncpoint manager. Typically these indoubt resources would not be available until the restart of the failed resource manager or recoordination with the external syncpoint manager respectively. Every effort should be made to make shared resources available as rapidly as possible, and this disclosure describes techniques we have used to achieve this. 1. The state of shared units of work and the resources affected is both logged and checkpointed. This means that during the first 'current status rebuild' phase of the failed resource manager restart, it is possible to forward recover the state of units of work indoubt with respect to their log, and resolve them at this time, which is earlier in the restart process and thus improves the availability of shared resources. It is possible to checkpoint the shared resource operations performed to avoid the time spent reading the log. 2. Resolve Indoubt processing for units of work indoubt with respect to an external syncpoint manager who resolves the operations for shared resources first, thus improving their availability over local resources. 3. Information about the operations performed on shared resources is held in a shared repository. A remote unit of work display facility allows active peers of the failed resource manager to query this shared unit of work information for indoubt units of work and a remote resolve indoubt command is implemented which allows a peer to unilaterally decide how to resolve the indoubt unit of work. It is possible to find indoubt and resolve indoubt units of work on a failed or inactive resource manager. 1 2