Browse Prior Art Database

Method for adaptive data generator based on reverse engineering of query execution plan

IP.com Disclosure Number: IPCOM000234907D
Publication Date: 2014-Feb-13
Document File: 4 page(s) / 36K

Publishing Venue

The IP.com Prior Art Database

Abstract

This article describes a method of generating test data based on reverse engineering of query execution plan. Such a plan contains valuable information including order of execution of particular steps and estimated number of rows as a result of each step.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 37% of the total text.

Page 01 of 4

Method for adaptive data generator based on reverse engineering of query execution plan

Disclosed is a method of generating test data based on reverse engineering of query execution plan.

Data generation is important part of both: software testing and optimization of software performance. Test Data Generation is seen to be a complex problem. The goal of the process is to imitate real data so the software in test environment behaves similar way than in real environment.

In database / data warehouse ecosystems accurate data generation cannot be overestimated. For example quite common way to optimize performance of data warehouse is to rewrite a query or a set of queries. Having mock data that imitates real one makes is much easier to test the correctness of the rewrite and measure a gain in performance. This is particularly important in supporting customer's environments and conducting "Proof of Concept" as a part of deal.

Main goal of this work is to describe one of the approaches that can be taken to generate very accurate test data when having query execution plan from any database or data warehouse system. The idea is to go through a plan and do data generation operations step by step. Query execution plan contains detailed information how the query was executed on real data (especially estimated number of rows as a result of each step and detailed execution order). In each step we generate number of rows indicated by estimation in the plan that satisfies the constraint (also indicated in the plan). After each step merging the result with previous step should be done.

As a result generated data is approximately the same as the real data


In database systems a query submitted to execution is parsed and splitted to a number of atomic tasks. In some systems such a task is called snippet. For example a snippet may read the data from disks, join the tables, aggregate, sort... Each snippet contains several steps (nodes) that must be done to complete the task. For example a snippet that is responsible for reading the data from disks contains following nodes: ScanNode, RestrictNode, ProjectNode.

Adaptive Data Generator needs to parse relevant nodes from the plan and generate the data according to them.

EXAMPLE 1

SELECT * FROM tab1 t1 JOIN tab2 t2 ON t1.a=t2.a

Let's assume that both tables contain exactly 1 record that satisfies join constraint. Execution plan for this particular query looks like this:

1[00]: spu ScanNode table "TS.ADMIN.TAB2" 1764666 memoryMode=no flags=0x0 index=0 cost=1 (o)

-- Cost=0.0..0.0 Rows=1 Width=8 Size=8 Conf=80 {} -- Cardinality T2.A 1 (Adjusted)

1[01]: spu RestrictNode (non-NULL)

     -- (A NOTNULL)
1[02]: spu ProjectNode, 2 cols, projectFlags=0x0

0:A 1:B
-- 0:A 1:B

1[03]: spu ReturnNode
501[00]: dbs DownloadTableNode broadcast into link 2171494605, keepStore=0 flags=0x0
2[00]: spu ScanNode table "TS.ADMIN.TAB1" 1764655 memoryMode=no flags=0x0

1



Page 02 of 4

index=0 cost=1
-- Cost=0.0..0.0 Rows=1 Width=8 Size=8 Conf=8...