Browse Prior Art Database

System and Method for Usage Aware Row Storage in Database Systems Disclosure Number: IPCOM000197960D
Publication Date: 2010-Jul-23
Document File: 3 page(s) / 50K

Publishing Venue

The Prior Art Database


Placing data in tables of a relational database system in storage containers with different access frequencies (e.g. solid state disks, hard drives, or tapes) is a well-known approach to level-out storage capacity and costs for the storage. It is a problem today to assign the correct data portions to the different containers. Typically, that's a manual task done by the database administrator. This article presents an approach how the access frequency of rows and columns can be tracked automatically by the database system, and those statistics are finally used to horizontally partition the rows in each table across storage containers and/or vertically partition the columns of the table and apply such a distribution. The effect are reduced costs and more optimal data placement on the storage, depending on the access pattern for the data.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 45% of the total text.

Page 1 of 3

System and Method for Usage Aware Row Storage in Database Systems

1. Motivation

Relational database systems store data in rows in tables. The rows are organized on pages as units for physical storage on disk. All pages for one table are stored in one tablespace, which is the management unit to assign disk storage to a table. The database administrator can distribute different tables to different tablespaces that have different disk characteristics. He can decide to put rarely used tables to tablespaces with cheap and slow disk performance characteristics. More frequently used tables are put to tablespaces with high performance disks. The same concept is used for the buffer management, associated with these table spaces. Tables, which are accessed very frequently, are normally preferred to be kept longer in buffer pools to avoid the I/O at all. Tables where rarely lookups happen, do not need to be cached that efficiently. Rare I/O overheads are the tradeoffs.

However when the content of a tables itself (i.e. its rows) has large variation of usage frequency the database administrator cannot flexibly distribute frequently used data and less frequently used data (of the same table) to disks with different characteristics. He would have to place the entire table to a high performance disk even if only 10% of its data is really used frequently just to ensure that the access to these 10% is optimized.

With range partitioning techniques the database administrator has today already the means to store the content of a table as different physical ranges of data, which in turn can be stored to different disks. However the assignment of data to the different table ranges can only happen based on the content of the table itself. E.g. all rows with values of a specified column being in a specified value range are stored to its own range. Finding these value ranges is very cumbersome in most customer environments. Not only that the data distribution change over time - meaning that an originally chosen range partitioning might not be appropriate after some time - finding the optimal split points between ranges in data with skew is a very difficult task. Deep knowledge of the data, it's skew and the expectation about growth and range changes is required if a value based partitioning is done today.

While range partitioning implies that a value-based partitioning can be applied to the data, it is also possible to manually define 2 or more tables (in different tablespaces if necessary) and to separate the rows from the original table manually over those multiple tables. Naturally, this task is even more cumbersome for administrators than a value-based range partitioning.

2. Introduction to Idea or Solution

The main idea to resolve the above problem is to introduce combination of tracking the usage frequency on row level and clustering of rows according to their usage frequency patterns to different pages that can be stored to different table ranges th...