Browse Prior Art Database

A fast incremental approximation method for local outlier factor

IP.com Disclosure Number: IPCOM000236518D
Publication Date: 2014-May-01
Document File: 3 page(s) / 64K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method to provide a fast approximation for determining outliers based on the Local Outlier Factor (LOF). This method approximates each cluster of low-LOF points by a bounding hull, thus making the recomputation required when a new point is added very efficient.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 49% of the total text.

Page 01 of 3

A fast incremental approximation method for local outlier factor

Relevant Definitions:


• A bounding hull of a set of points is the smallest (by volume) n dimensional bounding box, which is aligned with the coordinate axes that contains all the points in the set or an n dimensional bounding sphere.


• A point is an outlier with respect to a set of bounding hulls and a threshold T , if it is not contained in any of the bounding hulls.


• A point is an inlier with respect to a set of bounding hulls and a threshold T if it is strictly contained in any of the bounding hulls or lies on the boundary of at least two bounding hulls.


• A point is a boundary point with respect to a set of bounding hulls and a threshold T if it is neither an inlier nor an outlier.

Local Outlier Factor (LOF) is a technique that identifies outliers based on local neighborhoods, and has gained widespread use. Finding outliers has applications in many areas, including signal processing, network security, and detecting aberrant behavior of people. The method applies to a set of points in d dimensional space and assigns an LOF value to each point, where a large value (significantly larger than 1) indicates that a point is an outlier. Computationally, the expensive step of LOF is to compute the k nearest neighbors of all points (called kNN computation), where a straightforward computation takes O(N^2) time for a set of N points. Methods that are more sophisticated can work as fast as O(N logN) for low dimensions at the expense of creating elaborate data structures. For high dimensions these methods are much slower, and the complexity reverts to O(N^2).

In many situations, having a fast incremental method for determining outliers is important. One general such case is when the points in the set are added gradually , and a user wants to determine whether the new point is an outlier as soon as a new point is added. The algorithmic and computational challenges with this approach are that when a point is added, it may affect the k neighborhoods of previous points, so the LOF computation must be performed again from the beginning .

One incremental approach from the literature combines points with low LOF into clusters so that the nearest neighbor need not be recomputed for all points when a new point is added. Building upon this concept, the present solution provides an approximate method that is substantially faster, especially when the number of outliers is relatively small with respect to the number of points.

The novel contribution is a method to provide a fast approximation for computing outliers. This method approximates each cluster of low LOF points by a bounding hull, thus making the recomputation required when a new point is added very efficient .

The method quickly and incrementally computes, for each point in a set, whether it is an outlier. This is based on its Local Outlier Factor (LOF) value and a threshold parameter, where the computation at step n...