Browse Prior Art Database

Unbiased Approach to Determine The Initial Conditions for Clustering Techniques

IP.com Disclosure Number: IPCOM000032194D
Original Publication Date: 2004-Oct-26
Included in the Prior Art Database: 2004-Oct-26
Document File: 4 page(s) / 249K

Publishing Venue

IBM

Abstract

The proposed technique is a non-biased approach to select the initial values for the cluster means used for cluster-based techniques such as k-means clustering. Not only the final solution is unique and more representative of what the actual objects (clusters) are but the speed of convergence is noticeably improved by this approach. An application of this technique to a Satellite image of the RDU airport is demonstrated.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 4

Unbiased Approach to Determine The Initial Conditions for Clustering Techniques

Introduction

Cluster-based techniques usually simplifies the interpretation of images by segmenting different objects within the image. This is usually done as a pre-process stage before the input data (e.g. multi-band satellite image) is mined to extract some useful information.

The clustered input data (e.g. an image of a landscape) usually groups different objects within the image as a separate entities (e.g. trees, asphalt, roofing material, grass etc.). k - means clustering is one of the most popular techniques used for this purpose. This is an iterative technique that assigns samples to clusters by minimizing the distances between the samples and their respective cluster centers. Even though this technique is widely used in several fields and in general produces a satisfactory results, the technique have the following disadvantages:

- the initial picks for the cluster centers are randomly picked. As such, the resulted clusters solution may not be unique. - depending on the distribution of the initial picks, large number of iteration may be needed to converge to the final solution.

We propose a non-biased approach to select the initial values for the cluster means. The proposed technique achieves a solution which is unique and more representative of what the actual objects are. Additionally, the speed of convergence is noticeably improved by this approach.

Proposed Approach

Each pixel in the image is represented by several measurement (called frames or features)- e.g. Yellow pixel in an RGB image is represented by the 3 feature values of (255, 255, 0). Other images may have different and/or more number of features. However, the same concept still applies.

To have a better representation of the data and to speed-up the convergence process, we followed the following approach in determining the starting points for the cluster means:

1. Find the maximum and the minimum of each individual d feature (d = 7 frames in the case of the Satellite image).
2. Construct a d- dimensional line with a starting point of the minimum of each feature. The end point is the maximum of each of the individual frames.
3. Find c equidistance- points in this line, where c is the desired number of clusters.
4. The initial means are the closest actual points from each individual points above.

1

Page 2 of 4

Illustrative Experiment:

To illustrate our approach, we considered a captured seven-band image of the RDU airport taken via the LandSat satellite imaging system shown in Fig.(1) was used to conduct a clustering experiment via k - means technique. The dataset represents a seven-band image of the RDU airport taken via the LandSat satellite imaging system. The goal is to perform some kind of clustering to group the data into `reasonable' classes. The image consists of 131x231 pixels each of a dimension (frame) of 7. The different frames represent different bands- R, G, B, NIR, M...