Browse Prior Art Database

Configurable Primary Indices based on Pluggable Sorting Scheme for Apache HBase to accelerate record retrieval

IP.com Disclosure Number: IPCOM000241612D
Publication Date: 2015-May-18
Document File: 5 page(s) / 127K

Publishing Venue

The IP.com Prior Art Database

Abstract

This article provide a method to accelerate the Key-Value recodr retrival from HBase by adding a pluggable sorting scheme. Different sorting algorithms can be usded for different senario to reduce the StoreFile block loading from disk, thus imporve the Key-Value query performance.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 54% of the total text.

Page 01 of 5

Configurable Primary Indices based on Pluggable Sorting Scheme for Apache HBase to accelerate record retrieval

Background

Apache HBase is a distributed, column-family oriented, open-sourced database for big data application. HBase is now gaining widely attention and adoption in industry. Records (key-value pairs) in this database is organized as Log-Structure Merge (LSM) tree. This design choice help greatly improve the performance of write-operation on Hadoop File System (HDFS). However, this improvement is obtained at the cost of complicated and low-performance merger-operation when record is retrieved.

Block cache and flash-based cache are then introduced to help improve retrieval performance. But similar to any other caching techniques, these methods feel powerless for random access type.

Another method to improve record retrieval from HBase is building Secondary Index for tables in HBase. Retrieval efficiency can be advanced greatly by secondary index. But at least three limitations are encountered in HBase: 1. Secondary index is built for "join" operation, and so it's access pattern oriented. This means it contributes to retrieval of expected search key, but helps nothing for random access. 2. Typically, secondary index needs long time to build. A long latency means it's not for real-time application or on-line services. 3. Existing schemes of building secondary index for HBase are generally considered suffering from high loads on servers, long building time, or complex maintaining, low consistency. Also these schemes are only experimental.

Core idea:

In HBase, when secondary index is absent, varied primary indices can facilitate record retrieval of different access types by reducing StoreFile block loading and by enabling fast seeking and scanning of key-value pairs.

Main Idea

The main idea of this disclosure is described as below.

1. Pluggable sorting scheme

The sorting scheme can be applied to the application through following method:

1)Configuring sorting algorithm(s) by configuration file (attribute of column family, static),

2)or by self-learning (access pattern learning, Access Type Predictor, dynamic);

3)Setting fixed sortin...