Efficient Context Sensitive K Nearest Neighbor Search
Publication Date: 2015-Dec-22
The IP.com Prior Art Database
A method to compute mutual K-Nearest Neigbours (NN) in a linear time. Comparing two elements x,y, the method requires, additionally to the traditional K-NN, that the element x also belongs to the K-NN of y. The method is used in a categorization task.
Page 01 of 6
Efficient Context Sensitive K Nearest Neighbor Search Morteza Haghir Chehreghani
Xerox Research Centre Europe - XRCE
We develop a context sensitive and linear time K- nearest neighbor search, where we require the test object and its neighborhood (in the train dataset) to share a similar structure. Our method is particu- larly able to deal with two types of irregularities: i) when the (test) objects are outlier, i.e. they do not belong to any of the existing classes or structures in the (train) dataset, and ii) when the structures (e.g. classes) in the dataset have diverse densities. In- stead on capturing the correct underlying structure of the whole data, we extract the correct structure in the neighborhood of the test object. We investigate the performance of our method on a variety of real- world datasets and show its superior performance.
K-nearest neighbor search (K-NN) is a very common method for classification, query processing and retrieval. The goal is to find the K nearest neighbors of a new object (test object) from a given (train) dataset. The neighbors might be then processed further to identify the class label of the new object according to the majority in the neighbors obtained. However, the base K-nearest neighbor search fails to capture the underlying geometry and structure in data. In particular, the nearest neighbors of an object which is near the boundary between two different classes might include irrelevant train objects that reduce the quality of search [Kim and Choi, 2007; Kim and Choi, 2013].
Several metric learning approached have been proposed to address such issues [Domeniconi et al., 2002; Xing et al., 2003; Weinberger and Saul, 2009]. These methods work only with labeled or partially labeled data and might fail to cap- ture the underlying structure as demonstrated in [Kim and Choi, 2013]. Manifold learning is another approach for this purpose [Tenenbaum et al., 2000; Roweis and Saul, 2000; Belkin and Niyogi, 2003; Zhang and Zha, 2002] that com- putes a possibly low-dimensional embedding which preserves the structure. However, scalability of these methods is lim- ited since their computational complexity is O(N2) or higher, whereas the standard K-nearest neighbor search is linear. In addition, they require very carefully tuning the kernel param- eters, which is a non-trivial task [Nadler and Galun, 2007;
Luxburg, 2007]. Moreover, the good performance is attained only within a very small range of the values of the kernel pa- rameters [Wang and Zhang, 2008].
More recently, a category of distance measures, so-called link-based measures [Fouss et al., 2007; Chebotarev, 2011] take into account all the paths between the objects repre- sented in a graph, in order to compute the effective pairwise distances. These measures are often obtained by inverting the Laplacian of the distance matrix, in the context of regular- ized Laplacian and Markov d...