Browse Prior Art Database

Method for improving query performance in a heterogeneous database system

IP.com Disclosure Number: IPCOM000198309D
Publication Date: 2010-Aug-04
Document File: 7 page(s) / 136K

Publishing Venue

The IP.com Prior Art Database

Abstract

A heterogeneous database system, can extract data from multiple remote databases by only one compound query. It breaks down the input query into pieces of sub-queries, and distributes each sub-query to proper data source for execution, and then merge all result data locally. This invention is coming up with a solution on the heterogeneous database side to parallelize the data fetching from data source, and therefore improve the query performance in considerable degree.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 41% of the total text.

Page 1 of 7

Method for improving query performance in a heterogeneous database system

A heterogeneous database system, can extract data from multiple remote databases by only one compound query. It breaks down the input query into pieces of sub-queries, and distributes each sub-query to proper data source for execution, and then merge all result data locally. Taking the below scenario as example.

User wants to join t1 and t2. Heterogeneous database works in below sequence:
1) send "select * from t1" and to "select * from t2" to datasource1 and datasource 2 respectively.
2) Wait for the data from datasource1 and datasource2, and then does "join" locally.
3) Return result set to user, and go to step 2)

1

[This page contains 1 picture or other non-text object]

Page 2 of 7

The problem is, if the resultset returned by t2 is huge, heterogeneous database system will waste most of its time waiting for the data coming back through the network (especially when the network latency is large). Therefore, the user needs to wait for long time to get the join result. The SQL compiler could optimize the access plan for input query, but for certain sub-query sent out, access plan optimization can not improve the speed of retrieving data from remote data source in nature.

Currently, no database vendors support multi-channel data transfer for a single query request/statement. I.e. the data transfer between heterogeneous database and data source can't be parallelized due to data source capability.

This invention is coming up with a solution on the heterogeneous database side to parallelize the data fetching from data source, and therefore improve the query performance in considerable degree.

The core idea of this invention is to split a remote query into pieces based on the remote metadata(and statistics) acquired by the heterogeneous database. For example, if we want to retrieve all the rows of t2 from datasource2, and we know:
1) there exists a column defined as primary key, say "id", in t2
2) column "id" is defined in datatype "integer", and
3) the data distribution for column "id" is even
we can split the query into multiple pieces like below:

2

Page 3 of 7

And when the data comes back, a "UNION ALL" operation is done in heterogeneous database engine and the data is constructed into one whole piece:

3

[This page contains 1 picture or other non-text object]

Page 4 of 7

The advantage of this invention is: it can notably increase the heterogeneous database system performance in OLAP scenarios. Below table is a performance comparison between systems with and without this invention:

Method Elapsed time (seconds)

Traditional query 916
This invention applied and query split into 2 pieces 481
This invention applied and query split into 4 pieces 251

4

[This page contains 1 picture or other non-text object]

Page 5 of 7

This invention applied and query split into 8 pieces 139

This invention assumes the metadata and statistics of the remote table/columns is known to heterogeneous...