Browse Prior Art Database

Learning Values to Determine if Sufficient Data is Available to Create ADE Models

IP.com Disclosure Number: IPCOM000246060D
Publication Date: 2016-Apr-29
Document File: 2 page(s) / 46K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is an automated method to determine if there is sufficient data to allow the creation of an Anomaly Detection Engine for Linux Logs (ADE) model that is able to detect bad intervals.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 58% of the total text.

Page 01 of 2

Learning Values to Determine if Sufficient Data is Available to Create ADE Models

An anomaly detection engine for Linux logs (ADE) creates a model to detect bad intervals with the following process:

1. Sums the message anomaly scores for each message within an interval to create the interval message contribution value

2. For all of the intervals within the training period, orders the interval message contribution value


3. Assigns each interval message contribution value to a bucket using a histogram

4. Creates a distribution which maps the interval message contribution value to the interval anomaly score

The ADE uses key interval anomaly scores. If the score is below 99.5, then the interval is not defined as unusual. If the score is above 100, then the interval is defined as important. For an ADE to work, a difference must be present between the interval message contribution value for the 99.5 bucket and the interval message contribution value for the 100 bucket.

The ADE can create a model in two ways that make it unable to differentiate between the value for 99.5 and 100: the data is insufficient to create a reasonable model or the message traffic is so similar that the difference is undetectable.

For an ADE to work, it needs to detect which data source does not have sufficient data to create a useful model. There is currently no way to run any form of classification because there is no way to determine whether a model is good.

The current state of the art depe...