Browse Prior Art Database

System and Method for Optimizing Alternate Search Engine Query Criteria within Heterogeneous Queries Disclosure Number: IPCOM000018751D
Original Publication Date: 2003-Aug-06
Included in the Prior Art Database: 2003-Aug-06
Document File: 3 page(s) / 69K

Publishing Venue



We disclose the following: a method described provides an optimization for postfix generated queries, generated by a method defined in US Patent 6,128,612 or similar, wherein the query contains adjacent clauses that reference search engines which can manage their own boolean logic.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 3

  System and Method for Optimizing Alternate Search Engine Query Criteria within Heterogeneous Queries

Problem description:

1. There exists a general user query language that allows users of an ad-hoc query system to define their searches in a boolean fashion such as "kw=cats and ft=dogs" and have them submitted to a backend database.

2. Input in the form of this language is translated into SQL in a manner described by US Patent #6,128,612 "Method and System for Translating an Ad-Hoc Query Language Using Common Table Expressions" or another means that results in a postfix representation of the boolean logic. In either case, the end result is the user-input query is described in a postfix form.

3. Certain of the Common Table Expressions or subqueries, if formed using the process described in the patent above, may, in fact, represent queries against an alternate database search engine, for example, the IBM Text Information Extender.

4. The alternate database search engine has its own boolean syntax and means for internally optimizing its expressions, while it also has the ability to be embedded within standard SQL. For example, IBM's Text Information Extender has an operator CONTAINS. An example of usage of the CONTAINS shows how it has its own internal boolean logic separate from that of DB2 itself.

e.g. db2tx.CONTAINS(myindex, '"hair" & "brush"')=1

5. The overall performance of a single alternate database query managing its own ANDing and ORing is significantly better than allowing the main database system perform the ANDing and ORing between the individual criterion. This is the case with IBM's TIE and may be true of other search engines as well.

6. Given points 4 and 5, it is advantageous to find a method to ensure that the alternate database engine performs as many of its own aggregations (ANDs, ORs) as possible.

The solution:

After transforming the user-entered query in a postfix vector, visit each node of the postfix vector in order looking at the next three items each time. If these three items consist, in order, of two similar alternate database criteria followed by either an AND or OR joiner, then replace all three with the alternate database's representation of the operation indicated by the three terms and maintain the vector index at this location, else move to the next item in the postfix vector.

Representing the description above in Java pseudo-code:

Page 2 of 3

Given QIFVector - a vector of postfix items (either Joiner (AND/OR) or Criterion (e.g. kw=cars)).

1. for (int i =0; i<QIFVector.size() - 2 ; i++)
2. {

3. SrhPostfixStackItem item[3] = // form the window {