Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

# Optimal one dimensional clustering with limited number of items in a cluster

IP.com Disclosure Number: IPCOM000237178D
Publication Date: 2014-Jun-06
Document File: 3 page(s) / 35K

## Publishing Venue

The IP.com Prior Art Database

## Abstract

We present polynomial method for computing optimal k-means in 1-dimensional space with the upper bounded number of items in a cluster. More precisely, we compute optimal placement of k centers in 1-dimensional space where each center clusters at most m items. The centers are placed so that the sum of distances of all items to their centers is minimized. Each center represents a cluster and can be used to classify new items. Limited capacity of a cluster is used in supervised learning where maximum number of items in a cluster is limited in advance by the learning set.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 01 of 3

Optimal one dimensional clustering with limited number of items in a cluster

We present polynomial time method for k-means clustering (k-median) problem with limited number of items in one dimensional space . The k-means clustering problem can be stated in the following way. For given set of n items located in a d dimensional space the goal is to find k centers (each center represents a cluster) so that the sum of distances of all items to the closest center is minimized . Intuitively, we want to cluster items that are close to each other, so sum of item's distances to the cluster centers is a good measure of the quality of clustering . Once the centers are computed we can use them to solve fundamental problem of item classification in the following way . The new item is placed in a cluster represented by closest center . K-means problem is one of the most fundamental problems in machine learning . This problem is shown to be NP-complete in 2 dimensional space (see [1]) so no polynomial method for computing optimal solution is known. The best known approximation algorithm, that runs in polynomial time, provides approximation factor 3 (see [1]). The most popular heuristic used in practical applications is Lioyd's algorithm (see [2]). There is known polynomial time method of computing optimal clustering in one dimension (see [3]) provided capacity of each cluster is unlimited. Polynomial time method for clustering in one dimensional space where each cluster can contain at most m -items is not known. Limited capacity of a cluster is used in supervised learning where max number of items in a cluster is limited in advance.

[1] V. Arya, N. Garg, R. Khandekar, V. Pandit, A. Meyerson, K. Mungala

Local search heuristics for k -median and facility location problems

Proceedings of the 33rd Annual ACM Symposium on the Theory of Computing (2001) p. 21-29

[2] http://en.wikipedia.org/wiki/Lloyd%27s_algorithm

[3] Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming

H. Wang and M. Song http://journal.r-project.org/archive/2011-2/RJournal_2011-2_Wang+Song.pdf

Our method relies on dynamic programming technique .

Our method requires O(m*n*k) steps where
m - max number of items in a cluster
n - total number of items
K - number of clusters.

Input:

X[n]- one-dimensional array of n items.

X[i] is location of the ith item in 1-dimensional space. The table of items is sorted X[1] <= X[2] .<=.. <= X[n] K - number of centers to be computed.
m - max number of items in a cluster

We assume n <= m*K, since otherwise solution does not exist (that is some cluster would have to store more than m items ).

1

Page 02 of 3

Output:

K centers C[1], ... , C[K] where C[j] is the position of the jth center in 1-dimensional space. Each cluster can contain at most m items.

The goal is to minimize sum S of distances of all items to their centers .

Our method relies on the following lemma:

Lemma 1

Optimal cluster partitioning in 1-...