Browse Prior Art Database

An improved method for nested GROUP BY

IP.com Disclosure Number: IPCOM000248850D
Publication Date: 2017-Jan-18
Document File: 8 page(s) / 319K

Publishing Venue

The IP.com Prior Art Database

Abstract

As the dimension of data has increased, so has the demand for more interactive queries. GROUP BY clause can be used to control how data rows are organized in a result set. It precedes a grouping expression consisting of one or more columns which found in the result data set to be organized. The GROUP BY clause is also used to specify what columns are to be grouped together to provide input to aggregate functions such as SUM() and AVG(). In some situations, maybe need progressive aggregate functions based on the result set of subquery. It result in nested GROUP BY to be included in complex query. The nested GROUP BY means that outer select query includes GROUP BY clause and it must depend on the result set of subquery which also have GROUP BY.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 40% of the total text.

1

An improved method for nested GROUP BY

As the dimension of data has increased, so has the demand for more interactive queries. GROUP BY clause can be used to control how data rows are organized in a result set. It precedes a grouping expression consisting of one or more columns which found in the result data set to be organized. The GROUP BY clause is also used to specify what columns are to be grouped together to provide input to aggregate functions such as SUM() and AVG(). In some situations, maybe need progressive aggregate functions based on the result set of subquery. It result in nested GROUP BY to be included in complex query. The nested GROUP BY means that outer select query includes GROUP BY clause and it must depend on the result set of subquery which also have GROUP BY.

In business situation, nested GROUP BY was referred frequently. For example, a company maybe want to get the total sales of maximum monthly sales for every department in recently three years. We can write the sql as below: SELECT depID ,SUM(maxSALE) FROM (SELECT DEPID, YEAR, MAX(MonthSALES) FROM DEP_Year_Month_SALE WHERE YEAR IN (2013,2014,2015)

GROUP BY DEPID, YEAR) AS TEX(depID, yearNO, maxSALE) GROUP BY depID; The above sql including nested GROUP BY clause and cannot be simplified. Currently, for queries including nested GROUP BY like the example sql, database process it layer by layer and just analysis single GROUP BY clause at one time. In this way, we must assign space to store pre-computed results in a temporal result set such as workfile. And then, database need to scan same data more than one time. Especially, when query including multiple nested GROUP BY, database need to assign huge space to workfile. The normal method can be illustrated as picture 1.

Picture 1

We would like to issue an improved method for nested GROUP BY clauses. The finally result data set can be got by scanning base table only once time. When processing aggregate functions in subquery, we can calculate the aggregate functions of outer select clause in the same time. In this way, queries do not need materializing intermediate

2

result to MQT which avoid scanning same data rows repeatedly. As a result, both IO cost and CPU cost also can be reduced, the improved method offer better response time and performance than the existed method.

Picture 2

During processing query with nested GROUP BY, database could do a merge operation for nested GROUP BY clause, which means database could turn to calculate outer aggregate function after getting single result row in same group by column value of subquery. For all value of GROUP BY columns in subquery, database only need allocating one temporal space to storage the result value of aggregate function in subquery. When scanning to different value of GROUP BY columns, database can convert to calculate outer aggregate function and update the temporal value to save the result value. Then database will turn to process the subquery and repeat the above s...