Anomaly Detection on Multiple, Interdepent Data Sources
Publication Date: 2016-Oct-13
The IP.com Prior Art Database
Monitoring and analyzing interdependent time series is used to detect anomalies. In addition to analyzing the time series individually and detecting anomalies per time series, first the interdependency of time series is determined and secondly an anomaly measure is derived from the interdependency of time series.
Page 01 of 3
Anomaly Detection on Multiple , ,
Monitoring and analyzing time-ordered samples of multiple variables can be used to detect anomalies. Anomaly detection spans a large set of applications: from market data, to electroencephalograms, sensor data, or server load data.
In some setups, sensors are interdependent and may behave in a similar fashion. Moreover, some sensors may be redundant, leading to a great degree of similarity between the data recorded at two different endpoints.
There is a large potential in exploiting the interdependency of time series: In addition
to considering the shape of a single timeseries for detecting anomalies, one can
determine whether the similarities between a set of timeseries remains stable or shows unusual behavior. This can increase the accuracy of anomaly detection systems.
A major challenge, however, is the large additional computational complexity for finding interdependencies.
The novelty of this invention is a time-series analysis scheme that uses a suitable, compressed intermediate time-series representation for the detection of similarities between timeseries and the detection of deviations from known similarities.
The advantages of this invention are better scalability for large sets of sensors as well as the enablement of flexible load spreading between on-premise and off-premise analysis.
In a multi-sensor network as deployed, e.g., in ICS, it can be assumed that a subset of
sensors is used to monitor the same device. Therefore, there is a certain level of interdependency between some sensors. The interdependency can be analyzed based on the raw data. However, this represents a large computing overhead. Therefore, deriving from the raw data per time window a similarity measure first and then analyzing the series of similarity measures is more practical. This reduces the amount of data to be analyzed.
It is also motivated by the fact that the recording location of the time series may be far from each other geographically. For instance, there may be several kilometers between two sensors in an electrical grid. Preprocessing can be done locally, and only the metadata has to be sent to a central analytics instance, this way reducing the amount of data significantly.
This invention provides a novel anomaly detection approach on multiple time series by exploiting potential sensor interdependency and by analyzing the interdependency not on raw data but on metadata. This increases the accuracy of the anomaly detection approach without creating much overhead. The metadata representation can convey a certain compression rate with respect to the raw data. The choice of the compression rate facilitates the distribution of load between local analysis (i.e. edge, on premise) and remote analysis (i.e. cloud, off-premise). Especially in resource constrained environments one can take advantage of the flexible computational
load allocation, especially by adaptive load allocation based on the current resourc...