Browse Prior Art Database

System for determining and executing corrective action using trace information and historical resolution data from a change management system

IP.com Disclosure Number: IPCOM000236055D
Publication Date: 2014-Apr-03
Document File: 2 page(s) / 35K

Publishing Venue

The IP.com Prior Art Database

Abstract

IT professionals maintain systems which are complex in nature and prone to failure at any time. Failures are usually identified by trace information in logs. The failures are recorded in change management systems along with information describing root cause analysis and eventual resolution. These resolutions are routinely arrived at through trial and error or from analysis of only the situation at hand. These approaches are time consuming and often rely on the hands-on experience of the eventual resolver of the issue. Disclosed is a system which uses trace information collected during system failure and historical data on past resolutions from a change management system to automatically log and execute the best course of action.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 2

System for determining and executing corrective action using trace information and historical resolution data from a change management system

IT professionals maintain systems which are complex in nature and prone to failure at any time. Failures are usually identified by trace information in logs. The failures are recorded in change management systems along with information describing root cause analysis and eventual resolution. These resolutions are routinely arrived at through trial and error or from analysis of only the situation at hand. These approaches are time consuming and often rely on the hands-on experience of the eventual resolver of the issue. With the growing prevalence of SaaS offerings, repeated analyses of similar problems could translate into costly outages of production systems.

Logging and executing the best course of action can be automatic using system trace information collected during system failure and historical data on past resolutions from a change management system. Examples of actions could involve contacting the most qualified first responder, restarting a crashed system, or rolling said system back to an earlier state. Corrective action can automatically be taken by the system without waiting for human intervention. This results in significant reductions in system downtime.

Specifically,

A system failure is detected


Logs are automatically collected and correlated with reports from a database of past failures: Past problem reports in a Change Management System.

A heuristic is applied to the matching reports, to produce the corrective action most likely to resolve the problem. Examples of corrective actions include:

Restarting a crashed subsystem.

Rolling the system back to an earlier state.

Contacting an IT professional deemed most qualified for this type of problem.

This corrective action is executed


If the action is successful, log the result
If the action is not successful, repeat the process omitting the unsuccessful resolution
If after 'x' attempts, no resolution can be found, take a more traditional action (notify a system administrator)

Problem Analysis


The main data store for past-problem reports here is a change management or ticketing system. From past work items, several details will be used, including stack trace information. These stack...