Browse Prior Art Database

An Index-Directed Greedy Algorithm for the Optimization of Star Schema Queries

IP.com Disclosure Number: IPCOM000021775D
Original Publication Date: 2004-Feb-06
Included in the Prior Art Database: 2004-Feb-06
Document File: 3 page(s) / 36K

Publishing Venue

IBM

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 3

An Index-Directed Greedy Algorithm for the Optimization of Star Schema Queries

Introduction

Disclosed is a new query optimization method for star join using a "greedy" algorithm exploiting a multi-column index defined on the fact table. This method generates a quasi-optimal execution plan for a star join query based on the run-time implementation of the fact table index key feedback and repositioning on the dimension tables (referred to as "index skipping technique"). Compared with the existing method, the new algorithm expands the search space for the best plan to maximize the efficiency of the index skipping technique without adding excessive overhead to the bind time.

Star schema is one of the major design principles of data warehousing systems. Many relational database users have been adopting data warehousing approaches to resolve growing business needs on their databases with increasing complexity in the underlying schema designs. Queries to such databases are often in the form of star joins involving large number of tables, and therefore, efficient optimization of such queries is vital to the users of data warehousing.

Brief Summary of the Characteristics of Star Schema Databases and Star Join

Star schema is a natural consequence of the normalization of a large table. Several satellite tables called "dimension tables" surround a large table at the center, called the "fact table". Further normalization of a dimension table forms a "snowflake". A highly normalized star schema can contain many snowflakes and it is sometimes called a snowflake schema. In this paper, the term "dimension tables" is used to represent both base dimension tables and snowflakes, which are materialized into work files.

If a join graph of a query is in a form of star schema, it is called star join. In a star join query, the fact table is joined to many dimension tables. The typical characteristics of a star join are:

  No join predicate exists among the dimension tables. These tables are joined by Cartesian joins. The dimension tables are much smaller than the fact table, in terms of the number of rows. Filtering on the fact table is given via the local predicates on the dimension tables. In addition, the large fact table is often partitioned and has at least one multi-column partitioning index. In many cases, the multi-column partitioning index covers the join key columns on the fact table.

The New Optimization Method - An Index-Directed Greedy Algorithm

The algorithm consists of three parts:

Generate outside-in join sequences. Using a multi-column index on the fact table,

enumerate viable join sequences between some dimension tables and the fact table. This part of the algorithm generates join sequences ending with the fact table, possibly leaving out some dimension tables as unprocessed.


1.

1

Page 2 of 3


2.


3.

Determining Outside-in Join Sequence

For each fact table index Fx (C1, C2, ..., Cm):

For each possible prefix columns P(C1, C2, ..., Cn) where...