Browse Prior Art Database

Dynamically compress selective columns based on Real Time Statistics

IP.com Disclosure Number: IPCOM000239480D
Publication Date: 2014-Nov-12
Document File: 8 page(s) / 331K

Publishing Venue

The IP.com Prior Art Database

Abstract

Compression doesn't suit OLTP very well as extra CPU cost is needed to process compressed data. We introduce a new framework using Real time statistics to monitor and modify compress attribute of columns based on workload. then DBMS is able to dynamically adjust the compression attribute without user intervention. With this invention, DBMS is able to gain advantage of compression without taking too much overhead. For non-frequent updated columns, DBMS is able to reduce total tablespace size and IO cost... For frequent updated columns, DBMS is able to avoid extra compress/decompress CPU cost,

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 49% of the total text.

Page 01 of 8

Dynamically compress selective columns based on Real Time Statistics

The data compressing technology has been widely used in database products to maximize the resource utilization and reduce cost. Most compressing algorithms (i.e. LZ77, Huffman…) woks in a similar way: build a structure (called compression dictionary or tree) to store redundant information, and then replace those redundant information in real data block by links to the compression dictionary.

According to the compressing algorithms, any updatewould cause entire compress string to be decompressed and compressed even just one bit is changed. Compression and decompression instructions are really expensive, so CPU cost for update on compressed string is much higher than non-compressed string.

One common approach is to compress each column separately. Users can choose not to compress heavily update columns to avoid compress/decompress overhead for update. Users have to have basic compress knowledge and know the workload very well to make the right decision, otherwise the performance could be even worse if the wrong columns are chosen to be compressed. Even the right decision is made in the first place, any workloadchanges could turn the right decision to a bad oneand hit performance degradation due to missing a way to adjust the compress attribute dynamically.

A new framework using the following techniques is introduced to improve DBMS performance:

Real time statistics to monitor and modify compress attribute for columns
We introduce a new framework where real time update statistics determine the compress attribute. Normally the biggest CPU cost contributor for compression come from update, as users normally can avoid CPU cost by select via index access. We will focus on update statistics, but we reserve the right to check other statistics as well. The DBMS will collect and store update statistics data for each column of targeted table in real time. The goal is to quickly identify heavy update columns and make decision to adjust their compress attribute based on column type/length and other factors to improve update performance. Compress attribute can be easily adjusted based on real time statistics no matter how workload is changed. Since workload could be different in day time(OLTP) and night time(BATCH), an internal task would be started to check statistics every H hours. H can be set by user configuration or by DBMS default value, for example 24 hours. We suggest to set the value big enough to cover all kinds of workload to avoid ping-pong issue and choose the best compress attribute for involved columns. Users can also disable every H hours check and use utilities to do the decision.

We propose to check data length and update frequency, but additional checks may also be implemented. We will take LZ compression as the compress method example here, but our invention also works for other compression technology. The methodology is described by the following steps....