Parallelizing Federated SQL Queries in Massively Parallel Processing (MPP) Environment Through Dynamic Partition and Asynchronization
Original Publication Date: 2004-Oct-18
Included in the Prior Art Database: 2004-Oct-18
1) Federated SQL Query: Through using C++ derived class and virtual function technologies, DB2 Information Integrator (II) provided a generic framework for quickly developing wrappers. A wrapper is a dynamically loadable module of code that enables DB2 II to connect and access a certain kind of remote data sources. The kinds of remote data sources can be relational databases such as MSSQL databases, ORACLE databases, etc, or non-relational sources such as flat file sources, XML sources, etc. Applications that connect to DB2 II server can issue federated SQL queries which can select, join, union, group, order data from local tables on DB2 II server or from remote tables (i.e., nicknames) on multiple remote data sources. For example select A.col from local_table A, nickname_for_oracle_table B, nickname_for_mssql_table C where A.col=B.col and B.col=C.col. 2) Massively Parallel Processing (MPP) Environment: In a MPP environment, there are many nodes (or machines) connected through a communication facility such as TCP/IP. Each node has its own processor, memory and disks and there is nothing shared among the nodes. DB2 was extended to DB2 EEE to run in the MPP environment. In the MPP environment, DB2 EEE partitions local tables across the MPP nodes so that local tables can be accessed and processed parallelly on all MPP nodes. This disclosure is to illustrate how to extend DB2 II which is built on top of DB2 to process federated queries efficiently in the MPP environment.This disclosure is to enhance DB2 II performance and scalability in the MPP environment by accessing nicknames on different remote data sources asynchronously and parallelly, by partitioning nickname data dynamically, and by parallelizing the federated SQL queries with both nicknames and local partitioned tables.