Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Method And System For Parallelized Materialization In Parallel Database Systems

IP.com Disclosure Number: IPCOM000201617D
Publication Date: 2010-Nov-16
Document File: 3 page(s) / 27K

Publishing Venue

The IP.com Prior Art Database

Abstract

A method and system for parallelized materialization in parallel database systems is disclosed. The method pushes materialization operations as close to data as possible where multiple worker agents materialize in parallel.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 46% of the total text.

Page 01 of 3

Method And System For Parallelized Materialization In Parallel Database Systems

Disclosed is a method and system for parallelized materialization in parallel database systems. In a parallel database environment, data that is returned to a client is usually carried as a reference to the actual data. Materialization of a reference is the process of converting a reference toany data type to its external form. When the materialization process happens as part of a final step on a singleton coordinator agent, Remote Procedure Call (RPC) requests are made to materialize the data. RPC is typically a resource intensive operation as it involves transfer of data through the network.

The method and system as disclosed here pushes down the expensive materialization operation as close to the data as possible where multiple worker agents materialize the data in parallel. For example, in case of XML data, the materialization operation is pushed down as close as possible to the partition where the XML data resides. Since this part of the work is usually done by worker agents running in parallel on multiple nodes, parallelism is achieved automatically thereby helping in scalability. Furthermore, in cases where the operation can be pushed all the way down to where the XML data resides, costly remote materialization through RPC may be avoided. Thus, the method and system ensure that the XML data is materialized before being returned to the singleton coordinator agent. The coordinator agent is responsible for collecting all result data to do some final processing and bind out the qualifying data to the client.

In certain cases, the materialization cannot be pushed too far down. For example, materialization cannot be pushed below predicates as this might adversely affect the performance of the system. The system's performance is adversely affected as documents that are filtered by the predicate may get materialized. In the same manner, materialization operation cannot be pushed down to a level where an operation requires the data in its non-materialized or internal format.

In order to balance the need to push the materialization as far down as possible while ensuring that the data is not pushed too far down, operations may still be executed at a partition that is remote to the actual data. The operations may also be executed at the remote partition for ensuring that query semantics are not affected. Thus, RPC requests may still be required to materialize the reference. However, the materialization operation happens in parallel.

For instance, consider a query "select xmlquery ('$doc/a/b' passing T1.DOC as "doc") from T1;" having a plan graph as shown in fig. 1.

1


Page 02 of 3

(This page contains 00 pictures or other non-text object)

Figure 1

Assuming that the table T1 as shown in fig. 1 is partitioned on 4 nodes, there is at least

one agent on each node that scans the table, processes the XPATH, and finally sends the result of its computation to the c...