Browse Prior Art Database

Massive Parallel Processing BI support - Processing Unit hardware accelerated cube aggregation

IP.com Disclosure Number: IPCOM000239073D
Publication Date: 2014-Oct-09

Publishing Venue

The IP.com Prior Art Database

Abstract

The article describes hardware accelerated cube aggregation in the Massive Parallel Processing datawarehouse environment as a support for BI operations.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 29% of the total text.

Page 01 of 17

Massive Parallel Processing BI support - -- Processing Unit hardware accelerated cube Processing Unit hardware accelerated cube aggregation

Problem description

More then often the warehouse databases are used as engine for the Businesses Intelligence tools which leverage the warehouse's ability of aggregation. Such usage causes that operations as minimum/maximum/sum/average can be observed more frequent. Those operations, from their nature, usually are causing massive computation and are time consuming. Usually such aggregations are based on the fact data from huge fact tables grouped by key columns from that tables or related dimensions.

Example:

Table ORDERS
Column Type Meaning
OR_ID INT

Key

OR_TO_ID

INT

Foreign key - type of orders (e.g. available values 1 - foreign, 2 - domestic)

OR_PRICE

NUMBER(10,2)

Price, fact - can be aggregated

OR_TAX

NUMBER(10,2)

Tax, fact - can be aggregated

OR_F_ACTIVE

VARCHAR(1)

Flag Active/No active

OR_CO_ID

INT

Foreign key - date from calendar dimension table

Table CALENDAR (dimension)

Column Type Meaning CO_ID

INT

Key

CO_DT DATE Date
CO_MT DATE Date - the first day of the month CO_QT DATE Date - the first day of the quarter CO_YT DATE Date - the first day of the year

Such table can be used in some businesses reports in few scenarios when user needs minimum/maximum/sum/average prices (or tax) :


- for different type of orders (grouping by OR_TO_ID)


- for active/no active orders (grouping by OR_F_ACTIVE)


- for some days (grouping by CO_DT from related dimension)


- for some months (grouping by CO_MT from related dimension)


- for some quarters (grouping by CO_QT from related dimension )


- for some years (grouping by CO_YT from related dimension )

If request about such information is raised frequently it causes aggregation on the table ORDERS each time for databases like MPP where there is no cache. The main goal of the disclosure is to improve aggregation performance.

The idea address problem for MPP (Massive Parallel Processing) databases only: the

1


Page 02 of 17

idea is to create pre-aggregation for fact tables, maintained with hardware acceleration and stored on each node in parallel structure. The pre-aggregation is created for the tables/columns marked by the user using proposed in disclosure extension to the SQL language. Said aggregates after creation are:


- managed by database engine - each modification on the fact table causes auto-refreshing (in background)


- used automatically by database engine when needed in the background to speed up the queries

The new extension tag "aggregate for" is added to the SQL create table statement. Syntax specification:

create table

(

) aggregate for

;

where

:=

( [

[,] ] )

| dimension

( [

[,] ])

joined on ( [

[,] ] )

| ( [

[,] ] )

, dimension

( [

[,] ])

joined on ( [

[,] ] )

and consequently
alter table

[ ADD | REMOVE ] aggregate for

| [ ADD | REMOVE ] aggregate

(traditional definition)

:= name of the table

:= ( [

[,] ])

:=

(extented definition)

:=

on

|

:=

on

:=...