Method and apparatus to help failure analysis based on server pattern matching
Publication Date: 2014-Apr-04
The IP.com Prior Art Database
Described below are the apparatus and method to help technical support team or developer conduct problem determination and failure analysis efficiently based on "server pattern" matching.
Page 01 of 2
Method and apparatus to help failure analysis based on server pattern matching Described below are the apparatus and method to help technical support team or developer conduct problem determination and failure analysis efficiently based on "server pattern" matching.
Capturing log files from a failing systems of multiple customers, then uploading them to a management server is a general operation which is conducted world wide every day. A bunch of log files can be handled as a kind of "big data". Described below are the method and apparatus to help problem determination and failure analysis based on the "server pattern" matching of the big data.
Analyzing logs is one of major and crucial approaches for problem determination and failure analysis. The existing technology is not sufficient from the following perspectives.
- Log files include a lot of event logs for a long period(e.g. several years), which holds useful data. But technical support team or developer tend to focus on the event around the time of failure, so can not always utilize all of the data in logs effectively.
- Log files only include the information of "current" firmware level and hardware revision, so technical support team sometimes needs to check with customers for the maintenance history in the past, then compare event logs.
- Log files do not include the quantitative data of how many cases are reported under same hardware revision, firmware level or a specific combination of hardware revision and firmware level etc.
- It is difficult to investigate a symptom which occurs under some combination of firmware and hardware revision, because there is no objective data to judge.
Disclosed below is the sequence of this invention. It consists of two phases - prior and post data processing.
1. Prior data processing by management server
1-1. Multiple customers or customer engineers collect log files from a failing system world wide every day, then uploaded them to a management server. Technical support team or developer has an access to the management server for problem determination and failure analysis.
1-2. Event log in a log file consists of multiple messages which are categorized into each severity level like error, warning and information etc. The management server generates the attribution, then maps it to each message. The attribution is multi-variate data which consists of various hardware revisions and firmware levels of the target system. The attribution is called as "server pattern" for a descriptive purpose in this article.
Attached below is the example of "server pattern" of one failing system[Fig.]. The maximum value of each axis is '0' which corresponds to a latest revision or level. When it approaches the center of this graph, the value decreases one by one. The value '-1' means one previous level. The scale of each axis would change depending on the number of generations of each parameter. When new hardware revision or...