System for Pager Duty unreliable alerts detection.
Publication Date: 2016-Dec-20
The IP.com Prior Art Database
There are many Pager Duty alerts which don't need responder interaction. To reduce resolution time and costs it is crucial to know which alerts are false alerts. The idea of the system is to detect unreliable alerts based on machine learning algorithm and defined feature set.
System for Pager Duty unreliable alerts detection .
Definitions: -Machine learning algorithm - for example Support Vector Machine - supervised learning algorithm consisting of 2 phases: learning and classifying. In the 1st phase training dataset needs to be provided (historical data -Historical data - data from alerts/issues tracking tool like for example PagerDuty (resolved data with known target class: false / true alert).
Feature set: Component name Monitoring service name Alert name Alert type Alert date & time [occurrence time] Resolution time Acknowledge [y/n] Acknowledge time Autoresolved [y/n] Autoresolved time Environment [region] Environment [destination: internal/external] Environment utilization [CPU, RAM. HDD] Repeatedness [how often alerts occurs in a row within specified period of time ]
System consists of the following steps: 1. Classification algorithm learning phase based on historical data with known target labels: [true alert, false alert] 2. Model creation based on learning phase 3. Prediction of current alert target label : true / false alert 4. Feedback: in case of incorrect prediction update target label based on user action and add it to training set 5. Re-train the model based on updated training set 6. In next step it is possible to do deep-dive analysis into root cause of both: false and true alerts to find the root cause and eliminate it (see chart below as an example).
Example: Unreliable alerts:
Component: Service Broker