Automated failure detection, diagnosis and repair mechanism for OnDemand autonomic self-healing systems
Original Publication Date: 2004-Mar-19
Included in the Prior Art Database: 2004-Mar-19
Autonomic computing is a key feature of the OnDemand environment, requiring systems to be self-healing and thereby providing resource capacity that can be used based on demand. Existing failure detection and repair mechanisms designed for self-healing systems require customers to notify Service and Support personnel about failure occurences, and in turn the Service and Support personnel diagnose the failure and provide customers with the necessary fixes. This process requires multiple manual interactions and increases the mean time taken to fix the failures. In addition the diagnostic and systems management tools used by customers to detect failures are not integrated with Service and Support tools. Lack of integration between these two sets of tools also result in additional manual interactions. Increased mean time to fix results in direct increase of warranty costs. The invention described in this disclosure provides an automated mechanism to detect, diagnose and repair system failures. This mechanism also integrates diagnostic and systems management tools used for failure detection with those used by Service and Support personnel. This approach improves the self-healing characteristics of the targetted systems such as Servers (including Blade Servers), thereby providing Autonomic computing capabilities for OnDemand environments. For the remainder of this disclosure, the invention will be termed as Service Manager.