Browse Prior Art Database

Efficient star join method using sorted in-memory work file

IP.com Disclosure Number: IPCOM000021837D
Original Publication Date: 2004-Feb-11
Included in the Prior Art Database: 2004-Feb-11
Document File: 2 page(s) / 34K

Publishing Venue

IBM

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Efficient star join method using sorted in -memory work file

Introduction

This disclosure describes a new star join method using in-memory work files, based on a technique using fact table index key feedback and repositioning on the dimension tables (referred to as the "index skipping technique").

Star schema is one of the major design principles of large scale relational databases such as data warehousing systems. Developing an efficient join method for star schema queries is vital to support growing databases in terms of volume and complexity.

The Problems to be Solved

In a star schema, several satellite tables called "dimension tables" surround a large table at the center, called the "fact table". Further normalization of a dimension table forms a "snowflake". Because a highly normalized star schema can contain many dimension tables and snowflakes, joins of many tables are often required to resolve a star schema query. The star join method based on the index skipping technique involves the following phases:

Materialization of snowflakes: Each snowflake is resolved independently and

materialized into a work file. Outside-in join: All or some of the dimension tables or materialized snowflake work

files are joined to the fact table using the index skipping technique. The base dimension tables are sorted and materialized into work files prior to this step. The index skipping technique exploits a multi-column index on the fact table. Inside-out join: The remaining dimension tables and snowflake work files are joined

  to yield the final result set. Note that the outside-in and in-side out join phases are not necessarily independent operations. Therefore, the star join method involves many work files, in general.

The join process involving work files usually incurs certain overhead because work files do not have indexes:

Outside-in join phase: By definition, a star schema query does not have join predicates among the dimension tables and snowflakes. Work files in the outside-in join phase are Cartesian joined, and, even though the index skipping technique reduces the Cartesian iterations, the work files are often scanned many times and it can result in a large number of physical I/Os. Inside-out join phase: When a snowflake work file is involved in this phase, the merge scan join with sorting on the outer composite is often the only viable choice. The merge scan join with sorting the large outer composite can be a performance bottleneck.

New Method


1.


2.


3.

1

Page 2 of 2

The proposed method to overcome the shortcomings described above is to keep the work files in virtual storage, sorted in the join key order. A dedicated vir...