Efficient star join method using sorted in-memory work file
Original Publication Date: 2004-Feb-11
Included in the Prior Art Database: 2004-Feb-11
Efficient star join method using sorted in -memory work file
This disclosure describes a new star join method using in-memory work files, based on a technique using fact table index key feedback and repositioning on the dimension tables (referred to as the "index skipping technique").
Star schema is one of the major design principles of large scale relational databases such as data warehousing systems. Developing an efficient join method for star schema queries is vital to support growing databases in terms of volume and complexity.
The Problems to be Solved
In a star schema, several satellite tables called "dimension tables" surround a large table at the center, called the "fact table". Further normalization of a dimension table forms a "snowflake". Because a highly normalized star schema can contain many dimension tables and snowflakes, joins of many tables are often required to resolve a star schema query. The star join method based on the index skipping technique involves the following phases:
Materialization of snowflakes: Each snowflake is resolved independently and
materialized into a work file. Outside-in join: All or some of the dimension tables or materialized snowflake work
files are joined to the fact table using the index skipping technique. The base dimension tables are sorted and materialized into work files prior to this step. The index skipping technique exploits a multi-column index on the fact table. Inside-out join: The remaining dimension tables and snowflake work files are joined
to yield the final result set. Note that the outside-in and in-side out join phases are not necessarily independent operations. Therefore, the star join method involves many work files, in general.
The join process involving work files usually incurs certain overhead because work files do not have indexes:
Outside-in join phase: By definition, a star schema query does not have join predicates among the dimension tables and snowflakes. Work files in the outside-in join phase are Cartesian joined, and, even though the index skipping technique reduces the Cartesian iterations, the work files are often scanned many times and it can result in a large number of physical I/Os. Inside-out join phase: When a snowflake work file is involved in this phase, the merge scan join with sorting on the outer composite is often the only viable choice. The merge scan join with sorting the large outer composite can be a performance bottleneck.
The proposed method to overcome the shortcomings described above is to keep the work files in virtual storage, sorted in the join key order. A dedicated vir...