Browse Prior Art Database

Software Mechanism to Detect Loss of Access to Storage Devices in an AIX Environment

IP.com Disclosure Number: IPCOM000032158D
Original Publication Date: 2004-Oct-22
Included in the Prior Art Database: 2004-Oct-22
Document File: 1 page(s) / 35K

Publishing Venue

IBM

Abstract

There is an increasing emphasis on providing continuously available systems for storage subsystems. The AIX High Availability Cluster Multiprocessing (HACMP) application provides a foundation for developing high availability systems. However, some criteria for determining when a system should take over various resources still must be defined as appropriate for each environment that utilizes the AIX HACMP application. Detecting loss of access to file systems that are critical to an application is one example. A practical example of this would be a situation where the SAN controller for node a lost power but node b still has access through a different SAN controller. This publication describes a mechanism to detect loss of access to critical file systems and take action. This functionality is applicable not only to highly available systems, but to non high availability systems as well. Once loss of access to file systems is detected, the system can choose to take whatever action is desired

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 66% of the total text.

Page 1 of 1

THIS COPY WAS MADE FROM AN INTERNAL IBM DOCUMENT AND NOT FROM THE PUBLISHED BOOK

TUC820040027 John C Kennel/Tucson/IBM Christopher Knapp, Ellen Grusy

Software Mechanism to Detect Loss of Access to Storage Devices in an AIX Environment

A program is disclosed that provides the mechanism to:

define the quorum of file system resources that must be accessible. If the quorum of resources is not established a 'loss of quorum' event is signaled. test the accessibility of those resources issue the 'loss of quorum' event

Custom error notification is set up so that whenever an AIX disk error occurs, the 'loss of quorum' detection process is invoked. Each file system is considered a member of the quorum as part of a 'loss of quorum' threshold. This threshold is the percentage of file systems that must be inaccessible in order to trigger the 'loss of quorum' event.

When a file system is created, a hidden file is created on that file system with contents that reflect the identify of the resource. The 'loss of quorum' detection process does the following for each file system mounted on that node:

Issues an "ls -l"

Reads the hidden file and verifies content Issues an "lsvg" on the volume group containing the file system

If all of the above complete successfully, the 'successful access' counter is incremented. If a command does not return or the file cannot be accessed or does not have the correct content the 'failed access' counter is incremented. Note that the access test emplo...