Browse Prior Art Database

Parallelizing Federated SQL Queries in Massively Parallel Processing (MPP) Environment Through Dynamic Partition and Asynchronization

IP.com Disclosure Number: IPCOM000031984D
Original Publication Date: 2004-Oct-18
Included in the Prior Art Database: 2004-Oct-18
Document File: 3 page(s) / 197K

Publishing Venue

IBM

Abstract

1) Federated SQL Query: Through using C++ derived class and virtual function technologies, DB2 Information Integrator (II) provided a generic framework for quickly developing wrappers. A wrapper is a dynamically loadable module of code that enables DB2 II to connect and access a certain kind of remote data sources. The kinds of remote data sources can be relational databases such as MSSQL databases, ORACLE databases, etc, or non-relational sources such as flat file sources, XML sources, etc. Applications that connect to DB2 II server can issue federated SQL queries which can select, join, union, group, order data from local tables on DB2 II server or from remote tables (i.e., nicknames) on multiple remote data sources. For example select A.col from local_table A, nickname_for_oracle_table B, nickname_for_mssql_table C where A.col=B.col and B.col=C.col. 2) Massively Parallel Processing (MPP) Environment: In a MPP environment, there are many nodes (or machines) connected through a communication facility such as TCP/IP. Each node has its own processor, memory and disks and there is nothing shared among the nodes. DB2 was extended to DB2 EEE to run in the MPP environment. In the MPP environment, DB2 EEE partitions local tables across the MPP nodes so that local tables can be accessed and processed parallelly on all MPP nodes. This disclosure is to illustrate how to extend DB2 II which is built on top of DB2 to process federated queries efficiently in the MPP environment.This disclosure is to enhance DB2 II performance and scalability in the MPP environment by accessing nicknames on different remote data sources asynchronously and parallelly, by partitioning nickname data dynamically, and by parallelizing the federated SQL queries with both nicknames and local partitioned tables.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 3

Parallelizing Federated SQL Queries in Massively Parallel Processing (MPP) Environment Through Dynamic Partition and Asynchronization

DB2 II has been proved to be very useful in heterogeneous data replication, real time heterogeneous data warehouse areas.

Recently, there are growing strategic trends to entend Websphere portal with DB2 II to easily create portlets that can answer federated queries based on data spanning across multiple enterprise data sources as shown below.

It is estimated that DB2 II will play an important role in this business integration picture by reducing lines of code (by 40 - 65 %), generating savings for complex queries over multiple data sources lacking JDBC interfaces, minimizing SQL skill requirements, simplifying the portal architecture. DB2 II Performance and scalability will become more and more critical for DB2 II's success in the business integration domain.

When an application connects to DB2 II server, a db2agent is assigned as the coordinator agent for the application. More subagents will be generated on demand on the relevant MPP nodes to process the incoming SQL query. The life span of the db2agent is at least for the duration of a transaction, whereas the life span of a subagent is much shorter, usually no longer than the duration of a SQL query. To keep the transaction semantics correct, we can only keep one connection to each remote data source and this connection needs to be persistent for the duration of a transaction.

In DB2 II Release 1 (corresponding to DB2 V8.1), db2agent becomes the one that loads wrappers, setup connections and access nicknames. Figure 1 shows how DB2 II R1 process the following federated SQL query: select * from local_partitioned_table A, nickname_for_oracle_table B, nickname_for_mssql_table C where A.col=B.col and B.col=C.col.

When App1 connects to DB2 II server, db2agent1 is assigned as App1's coordinator agent. It loads Oracle wrapper and Mssql wrapper, connect to Oracle data source and Mssql data source to fetch data. The data will stay in db2agent1 only, so that the local_partitioned_table has to be merged back by subagents on each node into db2agent1 in order to join with nickname data. Similar steps will apply for App2.

1

[This page contains 1 picture or other non-text object]

Page 2 of 3

d b 2 c lie n t d b 2 s e r v e r

O ra c le S o u rc e

M s s q l S o u rc e

d b 2 a g e n t1

A p p 1 d b 2 a g e n t2

A p p 2

s u b a g e n t

s u b a g e n t

s u b a g e n t

s u b a g e n t

s u b a g e n t

s u b a g e n t

Figure 1

There are several problems in this picture:
1) The Oracle and Msqql wrappers loaded in db2agent are not shared by different applications, so we would not be able to scale up in high concurrency workloads in terms of memory usage.
2) DB2 engine is not protected from the third party client codes (Oracle and Mssql client codes).
3) One wrapper (Oracle wrapper) is not protected from another wrapper (Mssql wra...