Browse Prior Art Database

Component Health Record Evaluation Framework

IP.com Disclosure Number: IPCOM000222504D
Publication Date: 2012-Oct-11
Document File: 5 page(s) / 48K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a system and framework that facilitates the permanent tagging of diagnostic fault alerts for specific hardware components to the faulty hardware's health record, to be stored as part of the computing ecosystem and independent of third-party actors such as stand-alone Systems Management software.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 39% of the total text.

Page 01 of 5

Component Health Record Evaluation Framework

Systems Management technologies play an increasingly indispensable role in modern datacenters. An added value of Systems Management solutions is the ability to detect and report previous and/or pending failures of the various hardware, software, and environmental components that make up the computing infrastructure. There is a problem however, in that existing alerts recording and reporting mechanisms do not utilize component event histories (i.e., component health records) in determining the likelihood of a future fault, or as acceptance criteria for components to be allowed to join a computing environment. Existing alert reporting mechanisms therefore fall far short of using the alert history as an active metric of assessing the health of specific components and making projections as to the short or long term implications of allowing a given component to become part of a computing environment.

The invention is a system and framework that facilitates the permanent tagging of diagnostic fault alerts for specific hardware components to the faulty hardware's health record, to be stored as part of the computing ecosystem and independent of third-party actors such as stand-alone Systems Management software.

In addition, the system performs analyses of diagnostic faults to determine common error conditions across multiple systems. This enables the system to determine actions that constitute good versus bad sequences. Analysis of error conditions can be performed across various timelines.

This framework can also be used as a system for examining the health record of hardware components and using said record as part of the acceptance criteria for allowing membership into a computing environment.

The approach uses a software system within the ecosystem that tags a hardware component with having caused an offense to the ecosystem at one time. Then, it determines if the same hardware introduced a problem in another slot, thereby taking any number of actions, including disallowing the hardware from being part of the ecosystem on subsequent attempts to join.

Example Embodiment #1: On some predefined increment, an error is detected and classified

1. Determine the type of error that was detected

For example:


• an error can be booting error


• an error can be related to a power supply


• an error can be connectivity error


• an error can be a memory hardware error


2. Look back in time from that error on any system that has the error

3. Find commonality between all of the systems that exhibit the error For example:

1


Page 02 of 5


• a patch being applied and then the system rebooting


• a level of firmware being installed


• two incompatible software levels


• a manufacturing date later than x


• a serial number


• a network connection


4. Determine the common events among the systems with that error

5. Predict / prevent / warn prior to that condition (or series of conditions) being met in t...