Browse Prior Art Database

An Efficient Method for Determining the Optimum Join Sequence in a Relational Database System

IP.com Disclosure Number: IPCOM000021755D
Original Publication Date: 2004-Feb-05
Included in the Prior Art Database: 2004-Feb-05
Document File: 2 page(s) / 31K

Publishing Venue

IBM

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 2

An Efficient Method for Determining the Optimum Join Sequence in a Relational Database System

Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. RDBMS software using a Structured Query Language (SQL) interface is well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Organization (ANSI) and the International Standards Organization (ISO).

Algorithms used to determine the optimum join sequence for relational database queries which involve multiple tables can be very expensive in terms of CPU and storage use. One such algorithm, known as the Dynamic Programming Algorithm, can increase such usage proportional to 2**N, where N is the number of tables in the query. Therefore, it is very important that such algorithms be very efficient in their use of resources.

In the conventional Dynamic Programming Algorithm the goal of the algorithm used in processing a multi-table SQL query is to determine the complete detailed access path plan and to provide a great amount of supporting data, usually recorded in EXPLAIN tables. This is very expensive.

Therefore, there is a need for a simple and optimized method and system able to solve the CPU time and storage usage expense problem by determining the optimum join sequence for relational database multi-table queries.

The following is a method for decreasing the amount of CPU and storage resources required to BIND multi-table SQL queries.

1. For query blocks with a large number of tables, the tables are grouped into three classes:

A. Class 1 contains the one row tables.

B. Class 3 contains Primary Key Node tables, i.e. those tables which

- Have no local predicates

- Have one join predicate

- The join predicate is a Primary Key (in the Class 3 table) to Foreign Key (in a Class 1 or 2 table) predicate.

C. Class 2 contains all other tables.

As a result of these definitions, joining Class 3 tables to a composite does not alter either the number of rows in the composite or the ordering of the composite rows.

2. The pass 1 join-table...