Browse Prior Art Database

In memory query processing using cache include columns

IP.com Disclosure Number: IPCOM000033302D
Original Publication Date: 2004-Dec-06
Included in the Prior Art Database: 2004-Dec-06
Document File: 3 page(s) / 14K

Publishing Venue

IBM

Abstract

Object-relational extensions to database systems allow the extension of a standard relational database system with specialized support for new types of objects. Important examples of such new datatypes include text, images, audio, and video. To search on these multimedia data indexes are built that are stored outside of the scope of the database. In a database SQL query a predicate (e.g. CONTAINS) can be used to search on this text index. When such a query is executed a join between the text search results and the database table is necessary. The idea to solve the "Join Problem" described above is to use "Cache Include Columns". That is the data which will be retrieved by the application is cached in memory together with the primary keys of the search table. When issuing the text search query the result data is fetched directly from the cache thus avoiding a join completely.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 49% of the total text.

Page 1 of 3

In memory query processing using cache include columns In memory query processing using cache include columnsIn memory query processing using cache include columns In memory query processing using cache include columns

Introduction

Object-relational extensions to database systems allow the extension of a standard relational database system with specialized support for new types of objects . Important examples of such new datatypes include text, images, audio, and video . To search on these multimedia data indexes are built that are stored outside of the scope of the database, i.e. external search engines are used to build those indexes and store them for example in files.

    Lets assume for the remainder of this article that the external index is of type "TEXT" (However the same approach can also be applied to all other external index types such as audio video and image.). In a database SQL* (Standard Query Language) query a predicate "CONTAINS" can be used to search on this text index (the implementation of this predicate will access the external indexing structure to retrieve the results):

Example:

SELECT Price, ISBN

FROM Books
WHERE CONTAINS(Title, "Patents") AND Price < 10

    When this query is executed a join between the text search results and the database table ("Books") is necessary to retrieve the values for "Price" and "ISBN". This assumes that the CONTAINS predicate returns the appropriate primary keys to join upon. This means that in addition to the join the mapping between the text search engine document identifiers and the primary key has to be done by the CONTAINS predicate, adding even more cost to the overall query time.

From this simple example one can see right away that the join between the text search result and the database table can be very expensive once there is a large number of rows involved.

Idea of Publication / Novel contribution

    The idea to solve the "Join Problem" described above is to use "Cache Include Columns". That is the data which will be retrieved by the application is cached in memory together with the primary keys of the search table (in our initially mentioned example the table would be "Books"). When issuing the text search query the result data is fetched directly from the cache thus avoiding a join completely . Furthermore the mapping of the external document identifiers to the database primary keys can be avoided.

Key benefits

The benefits of this publication are all performance related :

No document name mapping necessary by adding the primary key to the cache.

By using extensibility functions of the database systems such as Table Valued Functions the include columns can be supported and exploit the cache functionality. The cache table results then can then also be used to further join them with other (cached or none cached) tables. Performance improvement even for medium sized result sets .

Evaluation of parametric condition on cache (e.g. WHERE

Page 2 of 3

CacheTable.Date > 2003-01-01).

De...