Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Cluster Algorithm for Vector Libraries Having Multiple Dimensions

IP.com Disclosure Number: IPCOM000111031D
Original Publication Date: 1994-Feb-01
Included in the Prior Art Database: 2005-Mar-26
Document File: 4 page(s) / 129K

Publishing Venue

IBM

Related People

Withum, TO: AUTHOR

Abstract

Disclosed is an algorithm that involves the clustering of library vectors with the same identity in order to maintain a smaller library and allow a faster method of searching for the most similar vector. The search method that is used takes advantage of the projections of the cluster spaces that are created. For this reason, the search time is reduced without a loss in accuracy.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 51% of the total text.

Cluster Algorithm for Vector Libraries Having Multiple Dimensions

      Disclosed is an algorithm that involves the clustering of
library vectors with the same identity in order to maintain a smaller
library and allow a faster method of searching for the most similar
vector.  The search method that is used takes advantage of the
projections of the cluster spaces that are created.  For this reason,
the search time is reduced without a loss in accuracy.

This algorithm is useful to a general system of the following
properties:

1.  Defined by a large library of multidimensional vectors, each of
    which has a non-unique identity.

2.  This library is utilized in a vector by vector comparison with a
    test vector in order to determine the identity of the test
    vector.  (This comparison is most often a type of distance
    calculation such as Euclidean or city block.)  The identity of
    the closest library vector is assigned to the test vector as long
    as other criteria such as distance to the next closest library
    vector are satisfied.

      The major setback of this type of system is that it takes too
long to search through the entire library for the closest vector to
the test vector.  For this reason, many algorithms use a clustered
form of this library in which one vector can represent many others of
the same identity.

      However, the problems with many clustering algorithms are that
the search is still too slow and that the clustered vectors do not
accurately represent the original library.

      The solution to the problem that was listed above is to cluster
all vectors with the same identity until a vector with a different
identity is encountered.  At this point, a constant radius is
determined that will form a multidimensional shape that will only
contain vectors of one identity.  (In 2 dimensions this shape is a
circle.  In 3 dimensions, a sphere.)  Fig. 1 shows examples of
possible cluster formations.  (Where "A"s and "B"s represent vectors
to be clustered, and "@"s represent the resulting average vector that
will become a member of the clustered library.)  Now the library
contains a new aspect, a radius.  Previous libraries only contained
the cluster vectors.

      Even though the above algorithm will work if this clustering is
randomly done, it is suggested that the process cluster all of one
vector identity ("A"s for instance) before continuing to the next
vector identity.  Furthermore, in order to efficiently cluster the
original library, a program should evaluate the possible "A" clusters
and then only implement the clustering algorithm for the largest
cluster space.  This process would then be repeated for the remaining
"A" vectors until the entire "A" space is sufficiently represented.

      There are two advantages to the algorithm.  First, a high
degree of accuracy is maintained because the cluster vectors can only
represent vectors of the same id...