Handling GSO Checkpoints in a *Tivoli Environment
Original Publication Date: 1999-Dec-01
Included in the Prior Art Database: 2003-Jun-11
A DCE Global Sign-On (GSO) administrator can execute a checkpoint on the DCE GSO servers at any time. The checkpoint operation can only be performed if there is sufficient disk resources available to save checkpoint data during the checkpoint. If there is not sufficient disk space to perform the checkpoint, the checkpoint operation will fail, and the GSO server will become inoperative. This disclosure describes a Tivoli enabled checkpoint space available monitor, DCE GSO rules designed to handle events generated by that monitor, and a Tivoli enabled task to perform a checkpoint when it has been determined that a space shortage exists, but there is still sufficient space to perform the checkpoint. The CheckpointSpaceAvailable monitor was defined and implemented using the Tivoli Distributed Monitoring framework. This monitor periodically looks at the disk space used by the checkpoint operation, and calculates the amount of space needed to perform the checkpoint. A percentage is returned which indicates the percentage of space available to perform a checkpoint. The idea here is that if 100% is returned, that indicates that all the space needed is available. Percentages greater than 100% indicate that there is more space available than needed, and a number less than 100 indicates that there is not enough space to perform a checkpoint. A normal system should have much greater than 100 percent available. If the number starts approaching 100%, that indicates that disk space is being used, and the system may be approaching a condition where there will not be enough disk space to perform the operation. The administrator can assign response levels to the events based on different thresholds. This is done so the administrator can control when a warning event is issued to start the automated checkpoint task. There are typically 3 thresholds or response levels: warning, severe, and critical. The event rules were designed to interpret the CheckpointSpaceAvailable event by watching the response level returned. If the response level is not critical and a checkpoint has not been performed in the last hour, then a Checkpoint Operation is initiated. If the event has a response level of critical, the automated checkpoint will not occur. At this point, the Tivoli DCE Administrator would have to notice the critical event, and fix the problem manually.