Browse Prior Art Database

Encoding Data into Irrational Magic Numbers for Fast Searching and Comparing

IP.com Disclosure Number: IPCOM000104454D
Original Publication Date: 1993-Apr-01
Included in the Prior Art Database: 2005-Mar-19
Document File: 4 page(s) / 140K

Publishing Venue

IBM

Related People

Georgiou, CJ: AUTHOR [+3]

Abstract

Disclosed is a simple encoding method yielding a unique representation of data objects for fast and efficient processing of lookup, modification, insertion, and deletion operations. This encoding can also be used to capture hierarchical relations between data objects (e.g., tree structures, or DAGs - Directed Acyclic Graphs).

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 40% of the total text.

Encoding Data into Irrational Magic Numbers for Fast Searching and Comparing

      Disclosed is a simple encoding method yielding a unique
representation  of data objects for fast and efficient processing of
lookup, modification, insertion, and deletion operations.  This
encoding  can also be used to capture hierarchical relations between
data objects (e.g., tree structures, or DAGs - Directed Acyclic
Graphs).

      The disclosed encoding method can be used for large data-bases,
where simple (i.e., non-relational) information searches are
performed most of the time.  It is also useful for applications in
which there is a need to reference a specific data item among many
similar items.  The encoding can yield a unique object "signature,"
or magic number.  Traditional data base operations can be done on the
magic numbers of the respective data items instead of the information
itself.  For large data records, this can be faster because of less
main memory space needed.  The objects to be encoded are assumed to
be text strings defined by an alphabet S, in which each letter s sub
i in S is a positive integer (e.g., ASCII code).  Hierarchical
(tree-like)  structures  of  various  other data objects can also be
encoded, as  shown  later.  The  disclosed  encoding method  has
three  parts: 1) A pre-computed, fixed constant coefficients table;
2) the evaluation  of  a  polynomial  in specific  points;  and  3)
the approximate representation of irrational numbers.  These parts
are explained below.

      The coefficient table T has a maximum of N entries, where N is
some upper bound on the object  length  (i.e.,  maximum number of
characters in a string).  Each table entry, alpha sub i in  T,  is
the value of the natural logarithm of a unique prime number (i.e., no
two different entries  in  the  table are  equal) divided by the
integer part of the resulting natural logarithm (called the
normalization procedure).

      The evaluation of the polynomial is done as follows:  Denoted
by s sub i is the i-th character in a text string of length n, where
n <=  N.  The polynomial yielding a unique representation of the
object is called a magic number MN, and has the form:

      alpha sub 1 times s sub 1 + alpha sub 2 times s sub 2 + ...  +
alpha sub n times s sub n.

      The evaluation of this polynomial for specific values of S sub
i yields a unique irrational number, i.e., there is a one-to-one
correspondence  between  a specific object (string) and the magic
number  it  represents.  This  is  based  on  the Schanuel
Conjecture  [1]  regarding algebraical independence between numbers
with certain properties.

      Irrational numbers require an infinite  memory  space  to
display  their value.  Therefore, they must be represented in a
computing system using an  approximation.  If  a  floating point
representation  is  used, the degree of approximation needed (i....