Browse Prior Art Database

A knowledge based problem diagnosis apparatus through machine learning

IP.com Disclosure Number: IPCOM000246240D
Publication Date: 2016-May-19
Document File: 9 page(s) / 206K

Publishing Venue

The IP.com Prior Art Database

Abstract

More and more enterprise infrastructure systems are moved into a cloud environment, which consists of hardware, network, management software, application, etc. Identifying root cause from a failure case in such a complex environment is a headache to most operation teams. This article describes an appratus to identify potential root causes automatically based on a knowledge base if an incident occurs. In this appratus, controlled chaos are generated and the symptoms for a failure ( including the errors that the user experience, and the monitoring items that are collected from monitoring compoents) are collected. The mapping between the symptoms and the root cause are mapped together and stored as a RAW pattern in a knowledge base. RAW patterns are analyzed automatically and the probability of a pattern occuring is computed and stored as a FINAL pattern. After the knowledge base is built, analyed and refined automatically, it could be used in a production environment. When an incident occurs, the symptoms are collected and described as a case. A CBR (Case Based Reasoning) algorithm is used to find the most similar pattern in the knowledge base. Therefore, the RC (Root Cause) with largest probability is sugested to the operation team for further action.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 01 of 9

A knowledge based problem diagnosis apparatus through machine learning

Today, there are more and more applications landing on the centralized infrastructure platforms , regardless publicly, or privately, or hybrids. At the same time, the centralized infrastructure platforms become more complex than ever , including hardware, network, storage, software and the configurations. Therefore, after a platform is built, and run in a production environment, how to find root cause when a failure happens becomes a pain point for operation team.

An apparatus is proposed which could automatically generate the required mapping in the diagnosis knowledge base through machine learning. In this apparatus, it is proposed to pro-actively and automatically generate "chaos" in a running system. While at the same time, the failure symptoms caused by the "chaos" are observed and collected . A collection of these descriptions from various components of the running system is called a "pattern" , and a way to describe the failure symptoms and the pattern is described in this article . The chaos and the pattern are mapped to show they are related. That's possible, that the same chaos caused different patterns because of the different status of a running system. So an analyzer is introduced to let machine learn from the experience , and find the probability of the relationship between pattern and chaos. In this disclosure, the high-level and detailed processes are introduced

Below is a high-level description of this apparatus.

1


Page 02 of 9

This apparatus involves
A module of simulatin...