Browse Prior Art Database

Fast Clustering Algorithm for Binary Vectors

IP.com Disclosure Number: IPCOM000075895D
Original Publication Date: 1971-Dec-01
Included in the Prior Art Database: 2005-Feb-24
Document File: 2 page(s) / 47K

Publishing Venue

IBM

Related People

Casey, RG: AUTHOR [+2]

Abstract

A well-known method of grouping data into similarity classes (or "clustering") is implemented for the important case of binary data vectors. Use of parallel boolean operations in place of the direct approach using serial arithmetic, yields order-of-magnitude increase in speed. Among the possible applications are character recognition, image processing, information retrieval, and file organization.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 68% of the total text.

Page 1 of 2

Fast Clustering Algorithm for Binary Vectors

A well-known method of grouping data into similarity classes (or "clustering") is implemented for the important case of binary data vectors. Use of parallel boolean operations in place of the direct approach using serial arithmetic, yields order-of-magnitude increase in speed. Among the possible applications are character recognition, image processing, information retrieval, and file organization.

It is assumed that the boolean operations "AND", "COMP" (complement), and "EOR" (Exclusive-OR) are available. These are basic hardware operations in most machines, operating in parallel on the bits of the argument vectors. In addition, it is assumed that the following functions are available.

N BITS (r) = number of "1" bits in vector r.

TRANS (V) = transpose of binary matrix V. That is, if V stored as column

vectors v1, v2, ---, vn then TRANS (V) yields

storage vectors v1*, v2*,---,vm* which are the

columns of the transpose of V.

Object of the algorithm is, given binary data vectors v1, v2, ---,vn to find a set of vectors x1, x2, ---,xk so as to minimize the function:

(Image Omitted)

where m(j) is a clustering function which assigns each vj to the nearest xi, thus forming similarity groups. k is a preset parameter. A general solution to this minimization problem is not known. The following procedure shows how to arrive at a local minimum starting from an arbitrary set of xi. There are two steps. 1) assign each vj to the neare...