Browse Prior Art Database

Database optimizer based on column usage-context

IP.com Disclosure Number: IPCOM000246232D
Publication Date: 2016-May-18
Document File: 3 page(s) / 32K

Publishing Venue

The IP.com Prior Art Database

Abstract

The main idea is to provide additional database processes responsible for analyzing table and column usage and re-organizing table structure. Basing on analyzer output re-organizer process performs table re-organization like column join or split stored into new columns. Such created new columns are used when database engine process the queries.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 54% of the total text.

Page 01 of 3

Database optimizer based on column usage -

When we look on the usage of the Warehouse Database we can observe a common patterns in SQL queries which can generate performance problems:


1) The data are filtered on part of the columns e.g.

select sum(n_price), sum(n_tax) … from t_invoices where datepart( 'MM', d_created), = '05' ;

The filter here requires sequential call for each row of the function extracting part of the data.

Especially hardware acceleration (FPG) cannot be used here.


2) The tables are joined on the parts of the columns

select * from T1 inner join T2 on datepart('YYYY', T1.d_created ) = T2.c_year

The table need to be projected with additional column. The values for that column had to be calculated in fly before merging (it cause sequential processing - expression join)


3) The tables are joined on the more than one columns

select * from T1 left outer join T2 on T1.id = T2.id and T1.created = T2.created

The optimizer cannot easily conclude estimates of join.

Proposed disclosure addresses those problems.

Database-usage analyzer process.

We introduce new additional process (like demon) responsible for collecting statistics for each table and column. Collected information:
how often part of the column was processed (by counting calls for routines extracting part of the data). The routine name and parameters are memorized as

well

how often column (as a whole) is used
how often table join condition was based on more than one column

Example
- lets consider tables

table T1 (c_pk1 int , c_pk2 int , c_pk3 int , c_date timestamp )

table T2 (c_pk1 int , c_pk2 int , c_pk3 int , c_code char(10) )

collected statistics:

- for column partially used (no evidence means no hit)

Table

Column

c_date

-context

context

Function/Parameter Frequency

T1

datepart YY 10

1


Page 02 of 3

T1

c_date

datepart MM 100 000

T1

c_date

datepart DD 10

T2

c_code

substr 0-4 1 000

T2

c_code

substr 5-9 10 000

- for column used (as a whole)

Table
Column Frequency

T1

c_date

20

T2

c_code

10

- for joins (collecting only multiple column joins)

Table 1...