Browse Prior Art Database

Monitoring Solution to Automatically Determine When a Resource is out of Norm

IP.com Disclosure Number: IPCOM000029730D
Original Publication Date: 2004-Jul-09
Included in the Prior Art Database: 2004-Jul-09
Document File: 2 page(s) / 36K

Publishing Venue

IBM

Abstract

Discloses a method allowing a monitoring tool to tune itself based on the current environment.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 45% of the total text.

Page 1 of 2

Monitoring Solution to Automatically Determine When a Resource is out of Norm

Monitoring systems provide the most value when they can automatically tune themselves to the environment they are running on. When the system is then out of "norm", an event can be sent. Automatic tuning results in more efficient monitoring of systems than traditional monitors that only monitor specific, statically-defined, event thresholds.

Current systems and network management solutions are based on static monitoring definitions. When the environment changes, monitoring needs to be manually changed to meet the changing management needs. Customers may define best practice norms to help set these definitions, however, each time the environment changes, manual efforts are required to update the management environment. An automated way of determining "what is normal" within certain tolerance is required.

The goal in monitoring a system is to understand when that system is out of the "normal" state. That "normal" state is highly system dependent and generally can not be correctly predetermined. The best solution to monitoring is to monitor the system for some period of time, and based on those results set the monitoring thresholds so that the outside the norm situations are discovered. This is a long and tiresome manual process today. The idea would be that the monitoring tool, once deployed, would monitor the system for a defined period to understand the normal conditions, then based on this, calculate the appropriate variances to set the thresholding for events. This solution is not a set of predefinitions required by the customer, but a continuous automatic tuning by the tools to provide the monitoring, thereby removing the manual part of this effort.

The monitoring tool would:

Watch the system for all metrics it understands for the defined period of time. This metric data would be saved locally, and would possibly be sent to a warehouse type function, if required. However no events would be sent to confuse the operators with invalid early alerts until the proper thresholding was calculated. Define the appropriate monitoring values and the appropriate timing windows based on the algorithm defined below.

   Different values could be calculated based on peak time and low usage time Events would be sent for out of norm conditions as well as warning or critical system-event type conditions. Set the event thresholding, based on the states calculated, for the appropriate time interval and levels to indicate out of normal conditions as well as warning and critical states and generate the appropriate clearing events for all events generated. Turn on events, with an initial event to the management system, indicating monitoring is now in place and the current monitoring values. Record metric data and reevaluate it on a set schedule for environmental changes. Monitor for systems changes and automatically reevaluate the metrics. For example, when memory or processors...