Database optimizer based on column usage-context
Publication Date: 2016-May-18
The IP.com Prior Art Database
The main idea is to provide additional database processes responsible for analyzing table and column usage and re-organizing table structure. Basing on analyzer output re-organizer process performs table re-organization like column join or split stored into new columns. Such created new columns are used when database engine process the queries.
Page 01 of 3
Database optimizer based on column usage -
When we look on the usage of the Warehouse Database we can observe a common patterns in SQL queries which can generate performance problems:
1) The data are filtered on part of the columns e.g.
select sum(n_price), sum(n_tax) … from t_invoices where datepart( 'MM', d_created), = '05' ;
The filter here requires sequential call for each row of the function extracting part of the data.
Especially hardware acceleration (FPG) cannot be used here.
2) The tables are joined on the parts of the columns
select * from T1 inner join T2 on datepart('YYYY', T1.d_created ) = T2.c_year
The table need to be projected with additional column. The values for that column had to be calculated in fly before merging (it cause sequential processing - expression join)
3) The tables are joined on the more than one columns
select * from T1 left outer join T2 on T1.id = T2.id and T1.created = T2.created
The optimizer cannot easily conclude estimates of join.
Proposed disclosure addresses those problems.
Database-usage analyzer process.
We introduce new additional process (like demon) responsible for collecting statistics for each table and column. Collected information:
how often part of the column was processed (by counting calls for routines extracting part of the data). The routine name and parameters are memorized as
how often column (as a whole) is used
how often table join condition was based on more than one column
- lets consider tables
table T1 (c_pk1 int , c_pk2 int , c_pk3 int , c_date timestamp )
table T2 (c_pk1 int , c_pk2 int , c_pk3 int , c_code char(10) )
- for column partially used (no evidence means no hit)
datepart YY 10
Page 02 of 3
datepart MM 100 000
datepart DD 10
substr 0-4 1 000
substr 5-9 10 000
- for column used (as a whole)
- for joins (collecting only multiple column joins)