Browse Prior Art Database

MDX Aggregate Cache

IP.com Disclosure Number: IPCOM000193474D
Original Publication Date: 2010-Feb-25
Included in the Prior Art Database: 2010-Feb-25
Document File: 2 page(s) / 33K

Publishing Venue

IBM

Abstract

Disclosed is an approach implemented within MDX engine to optimize aggregate queries by caching aggregates computed during query execution and reusing them to compute other dependent cells that benefit from the aggregations and may, in fact be part of the query itself. This optimization – called aggregate cache – is in effect for the duration of execution of one query execution and ensures that all dependent cell(s) for computing all aggregates in a query are accessed exactly once and no aggregate is computed more than once. Decision support applications tend to require execution of complex aggregate heavy queries on large data sets. Intelligent query optimization is critical to keep the response time of such queries to be interactive so users can work with such applications at the speed of the thought process. Simplistically, a cell in a multidimensional source is the analog of a row in a relational source (I say simplistically because a row can have many measures and so many cells). Thus, queries against the multi-dimension data source request cells. Each cell of a multidimensional source (more commonly referred to as the cube) may optionally derived by aggregating dependent cells and the dependencies are pre-defined through parent-child hierarchical metadata relationships. In general, the values of parent level data cells depend on value of descendant (children/grandchildren) cells in the cube. Trivially a cube consists of 2 types of cells – leaf level cells that are not dependent on anything and are generally loaded from an external system into a cube. Upper level cells that are dependent on children cells and this relation is recursive. The distance of an upper level cell from a leaf cell is determined by the maximum of the distances of each co-ordinate of the cell from the leaf level. When the aggregations are commutative (SUM, MIN, MAX, COUNT) it is possible to derive each upper level cell either from its immediate children or descendants at any lower level. The straight forward execution of a query simply expands the query into cells requested. Each cell can then be derived independent of other cells until all cells have been computed. Then the query execution is complete. However this method ignores the fact that either 1. The cells that some upper level cells in the query depend on may in fact be part of the query itself. 2. More than one upper level cell (Ui) in the query may dependent on the same cell (Li). Knowing this relationship can help optimize aggregation of Ui when Li is visited. Understanding the relationship between the cells in a query is key to optimizing query execution.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 2

MDX Aggregate Cache

Disclosed is a method for implementation within a multidimensional expression (MDX) engine to optimize aggregate queries by caching aggregates computed during query execution. The aggregate queries are re-used to compute other dependent cells that benefit from the aggregations and may also be part of the query. The disclosed method referred to as aggregate cache is in effect for the duration of one query execution and ensures that all dependent cell(s) for computing all aggregates in a query are accessed exactly once and no aggregate is computed more than once.

Decision support applications tend to require execution of complex aggregate compute intensive queries on large data sets. Intelligent query optimization is critical in maintaining short response times of the intensive queries enabling users to interact with applications at a speed of thought.

Simplistically, a cell in a multidimensional source is the analog of a row in a relational source (simplistically because a row can have many measures and therefore many cells). Thus, queries against the multi-dimension data source request cells. Each cell of a multidimensional data source, more commonly referred to as a cube, may be optionally derived by aggregating dependent cells in which dependencies are pre-defined through parent-child hierarchical metadata relationships. In general, the values of parent level data cells depend on values of descendant (children/grandchildren) cells in the cube.

Trivially a cube consists of 2 types of cells. A first type, leaf level cells are not dependent on anything and are generally loaded from an external system into a cube. A second type of upper level cells is depe...