Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Efficient Database Indexing for Quadratic Form Distance Functions

IP.com Disclosure Number: IPCOM000111246D
Original Publication Date: 1994-Feb-01
Included in the Prior Art Database: 2005-Mar-26
Document File: 2 page(s) / 49K

Publishing Venue

IBM

Related People

Equitz, W: AUTHOR [+5]

Abstract

Disclosed is a method for improving the efficiency of certain kinds of queries to a database. The particular queries made more efficient are those in which the user wants to retrieve all records "similar" to a given specification (as opposed to requiring exact matches), where "similarity" between one record X = (x sub 1 , x sub 2 , ... , x sub N) and another record Y = (y sub 1 , y sub 2 , ... , y sub N) is given by

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Efficient Database Indexing for Quadratic Form Distance Functions

      Disclosed is a method for improving the efficiency of certain
kinds of queries to a database.  The particular queries made more
efficient are those in which the user wants to retrieve all records
"similar" to a given specification (as opposed to requiring exact
matches), where "similarity" between one record X = (x sub 1 , x sub
2 , ...  , x sub N) and another record Y = (y sub 1 , y sub 2 , ...
, y sub N) is given by

  d sub 1 (X,Y) = (X - Y) sup T S (X - Y).

Here  X  and  Y  are histograms (discrete distributions) over
possible values of something (such as colors), and
 S  is an  N * N  matrix denoting the similarity between every pair
of attributes (such as colors) defined by the coordinates of the
records
 X  and  Y .  For example, in a color application, each entry in
 S  might be defined as being in the range from 0 to 1, with, for
example,
 s sub <red,orange> similar 1 , and  s sub <red,green> similar 0 .

      The specific queries that are made more efficient are queries
of the form, "Given  X , find all records  Y  for which  d sub 1
(X,Y) lt epsilon " .  The way that these queries are made more
efficient is by doing the query in two stages, first performing a
cheap filtering operation which selects a subset of the records in
question, including all the records for which
 d sub 1 (X,Y) lt epsilon  .  A second pass is then made over the
results of filtering operation to extract the exact records desired....