Browse Prior Art Database

Method for efficient mapping of hierarchical data to relational views

IP.com Disclosure Number: IPCOM000088886D
Original Publication Date: 2005-Mar-04
Included in the Prior Art Database: 2005-Mar-04
Document File: 4 page(s) / 35K

Publishing Venue

IBM

Abstract

Disclosed is an efficient method for mapping hierarchical data, as markup languages, into relational tables using an XPath processor. The XPath language operates over nested hierarchical data with repeating items. It always returns results in a form of a list of nodes or values. As opposed to this, relational engines expect data in tabular format, without nesting and multiple columns. This invention proposes a method for compact description of the mapping that can be used as query plan, and a method to interpret this description during runtime to produce the result of the query. A previous invention described the XML Wrapper that can map hierarchical data to relational tables. However in that invention the unnesting was performed on hierarchies only one level deep. The current invention describes how to unnest hierarchies with arbitrary number of levels. As such, it allows for increased performance and addition of other features, as streaming processing of XML documents. Some background on Federated Systems and XML wrapper is necessary for a better understanding of this invention. For an overview of Federated Systems, please refer to Chapter 1 in the following documentation: ftp://ftp.software.ibm.com/ps/products/db2/info/vr8/pdf/letter/db2fpe80.pdf For information on the XML wrapper, please refer to Chapter 7 of the DB2 Life Sciences Data Connect Guide: ftp://ftp.software.ibm.com/ps/products/db2/info/vr8/pdf/letter/db2lse80.pdf The XML wrapper maps XML data into a set of relational views, which can then be queried using the full power of standard SQL.The XML data is not stored in DB2, but is retrieved and processed on demand from its source (from a local file, a remote file or a DB2 column). The core idea behind the XML wrapper is that a hierarchical document can be thought as a set of relational tables in which hierarchical nesting is replaced by joins between the parent and child tables. Whenever the XML document contains a nested, repeating construct, the corresponding relational model treats these nested elements as rows of a separate table, joined to their parent element by means of an implicit foreign key.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 41% of the total text.

Page 1 of 4

Method for efficient mapping of hierarchical data to relational views

The core idea of this invention is that parent/child predicates can now be pushed down to the XML wrapper level. The DB2 engine is not responsible for processing these joins anymore since the XML wrapper will be handling them and will return the joined data.

In order to process the parent/child predicates, the XML wrapper will now accept plans that contain more than one quantifier/table. That was not the case in the previous version where only plans with one quantifier/table were being accepted. Below is an example that illustrates this change.

This example uses 2 nicknames: a parent nickname called CUSTOMERS and a child nickname called ORDERS. Here are the 2 DDLs for creating those nicknames:

CREATE NICKNAME CUSTOMERS (id VARCHAR(16) OPTIONS(XPATH '@id'), name VARCHAR(15) OPTIONS(XPATH './/name/@last'),

cid VARCHAR(16) OPTIONS(PRIMARY_KEY 'YES'))

FOR SERVER xml_server OPTIONS (FILE_PATH 'C:\xml\Customers.xml', XPATH '//customer');

CREATE NICKNAME ORDERS

(cid VARCHAR(16) OPTIONS(FOREIGN_KEY 'CUSTOMERS'),

amount VARCHAR(20) OPTIONS(XPATH './amount/text()')) FOR SERVER xml_server OPTIONS (XPATH './/order');

Here is a simple query we issue:

SELECT c.name, o.amount FROM CUSTOMERS c, ORDERS o WHERE c.cid = o.cid;

Now this is the access plan that would be generated with the previous version of the XML wrapper:

where NLJN stands for Nested Loop JoiN

Hence, the XML wrapper had to accept and execute 2 plans: one for CUSTOMERS and one for ORDERS.

1. SELECT name, cid 2. SELECT amount, cid FROM CUSTOMERS; FROM ORDERS

WHERE cid = ?;

With this invention incorporated to the XML wrapper, the access plan looks like this:

RETURN

NLJN

CUSTOMERS

ORDERS

1

[This page contains 1 picture or other non-text object]

Page 2 of 4

RETURN

CUSTOMERS/ORDERS

The nested loop join is not part of the access plan anymore since the join is processed by the wrapper. The wrapper returns the joined data to the engine. Here is the plan accepted and executed by the wrapper. It's actually the entire query issued by the user:

SELECT c.name, c.cid, o.amount, o.cid

FROM CUSTOMERS c, ORDERS o
WHERE c.cid = o.cid;

Below is an example that illustrates how this invention works.

Let's assume we have the following structure in an XML document:

doc

|

customer

/ \

order payment

|

item

Here's a sample XML document that has this schema:

<doc>

<customer id='1'>

<payment>

      <date>2002-01-01</date> <order>

<amount>100</amount> <item>

           <reference>123</reference> </item> </order> <order>

<amount>200</amount> <item>

                <reference>1234</reference> </item> </order> </customer> <customer id='2'>

           ... </customer> </doc>

Here are the DDLs for the XML wrapper we would create to represent this structure:

CREATE NICKNAME CUSTOMERS (id VARCHAR(16) OPTIONS(XPATH '@id'),

cid VARCHAR(16) OPTIONS(PRIMARY_KEY 'YES'))

2

Page 3 of 4

FOR SERVER xml_server

OPTIONS (FILE_PATH 'C:\xml\Customers.xml', XPATH '//customer');

CREATE NICKNAME PAYMENTS

(cid V...