Browse Prior Art Database

Server Monitor History Correlating Error to Operational Server Status

IP.com Disclosure Number: IPCOM000014495D
Original Publication Date: 2000-Apr-01
Included in the Prior Art Database: 2003-Jun-19
Document File: 3 page(s) / 43K

Publishing Venue

IBM

Abstract

The service processor is used to monitor environmental factors such as temperatures, voltages, and fan speed within a server. It provides a snapshot of the current conditions within the server. When one of the environmental monitors goes out of its specified range, a message is sent to an operator to alert him of the condition. Many times this condition is not serious enough to cause a catastrophic failure within the server and it continues to operate normally. Even if the failure does cause the server to power off to protect sensitive electronic circuitry, the service processor continues to operate on continuous standby power so the condition causing the failure is not lost. When the operator views the alert message, this disclosure describes a method to historically track the operation of the server and correlate the results to environmental failure that has occurred in the server. A continuously operating program within the service processor samples the hardware environmental sensors (e.g., temperatures, voltages and fan utilization) every 5 minutes. The software then constructs a graph over time of the samples. When an error occurs, an event is sent to the operator and a time stamp placed on the historic record indicating the event that occurred and the time of occurrence. With this information, the operator completing the failure analysis can correlate the condition of the server to the problem that occurred. This proposed method also allows the operator to analyze historic trends of server environmental conditions in order to determine if a component or system failure is imminent. Colors change on the graph to denote the level of severity for a particular hardware sensor reading. If a warning or critical problem persists, this is denoted by a blinking effect and a pattern change in order to draw the operators attention to the problem area, as well as providing a visual cue for operators who may be color-blind. When a component or the system does fail, the specific alert with a time stamp is placed on the historic graph to aide in problem determination. The following figure depicts the operation of this program. 1 90 o F

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 62% of the total text.

Page 1 of 3

Server Monitor History Correlating Error to Operational Server Status

The service processor is used to monitor environmental factors such as temperatures, voltages, and fan speed within a server. It provides a snapshot of the current conditions within the server. When one of the environmental monitors goes out of its specified range, a message is sent to an operator to alert him of the condition. Many times this condition is not serious enough to cause a catastrophic failure within the server and it continues to operate normally. Even if the failure does cause the server to power off to protect sensitive electronic circuitry, the service processor continues to operate on continuous standby power so the condition causing the failure is not lost. When the operator views the alert message, this disclosure describes a method to historically track the operation of the server and correlate the results to environmental failure that has occurred in the server.

  A continuously operating program within the service processor samples the hardware environmental sensors (e.g., temperatures, voltages and fan utilization) every 5 minutes. The software then constructs a graph over time of the samples. When an error occurs, an event is sent to the operator and a time stamp placed on the historic record indicating the event that occurred and the time of occurrence. With this information, the operator completing the failure analysis can correlate the condition of the server to the pr...