Browse Prior Art Database

Method Of Estimating CPU Capacity For Data Warehouse Applications

IP.com Disclosure Number: IPCOM000035301D
Original Publication Date: 2005-Jan-20
Included in the Prior Art Database: 2005-Jan-20
Document File: 2 page(s) / 48K

Publishing Venue

IBM

Abstract

This invention consists of two related methods. The first method measures the CPU scan rates of computer processors, while the second method applies the scan rate data to determine the CPU capacity requirement of a data warehouse application.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 48% of the total text.

Page 1 of 2

Method Of Estimating CPU Capacity For Data Warehouse Applications

Disclosed is a methodology, consisting of two related methods, of estimating CPU capacity requirements for data warehouse applications. The first method measures the CPU scan rates of computer processors, while the second method applies the scan rate data to determine the CPU capacity requirement of a data warehouse application.

     CPU capacity requirement for data warehouse workloads is mainly driven by the amount of data processed for queries. The core part of this invention is a method of accessing the CPU cycles to process a row. A series of measurements is conducted to determine the scan rate, measured in MB/sec, for different types of queries. Since the amount of data processed could be estimated by users, this invention also describes a method of using the scan rate data to determine the CPU capacity to support a data warehouse workload.

     This invention emphasizes a metric, scan rate, which lists the amount of data that a specific CPU model would process for a given type of query predicate for database queries. The combination of asking users a set of questions related to database table sizes, average row length, percentage of rows used by queries, business nature of queries and measuring the scan rates of a specific CPU model with respect to various query predicate types delivers a methodology of sizing data warehouse workloads.

     Prior to measurements, decide on whether compressed or uncompressed data will be used. Several database products offer compression capabilities. Some are hardware based, while others rely on software support. Compared to uncompressed data, compressed data presents very different scan rate values because of the overhead of decompressing each row of data when returning an answer set to an application. Both compressed and uncompressed data measurements should be performed for all scenarios.

     Next determine the row length of a table desired to be measured. Again, a range of values should be selected so that the measured data will be appropriate for most user data. Once the compression indicator and the row length are selected, the appropriate table will be used for scan rate measurements.

     Build a database containing more than one hundred gigabytes of data. Tables should be of the size of tens of gigabytes. With smaller tables, queries complete quickly and measurement fluctuation introduces a larger margin of error.

     To measure CPU scan rate for I/O intensive queries, run a query with just the count(*) column function. Record the CPU time of the query. Compute the CPU scan rate with the following equation:

Scan rate = (# of rows in table) * (average row length) / CPU time

     The unit of scan rate is expressed in MB/sec. For the above example, the calculated value refers to I/O intensive queries, since it is inexpensive to support the count(*) column function.

     To measure CPU scan rate for moderately I/O intensive queries, run a query with three pre...