Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

A method of data rebalance system based on consistent hash

IP.com Disclosure Number: IPCOM000248284D
Publication Date: 2016-Nov-15
Document File: 8 page(s) / 188K

Publishing Venue

The IP.com Prior Art Database

Abstract

The consistent HASH is used extensively in distributed file-system, but the consistent HASH method cannot guarantee exactly the balance of the data. Especially, it will lead to the unbalance after removing node or adding node. For Example: when scale out; the newly added node will only share the workload of the neighboring node. The same situation after removing node from the system. This artical gives a method of data rebalance system based on enhanced consistent hash.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 36% of the total text.

Page 01 of 8

A method of data rebalance system based on consistent hash

The consistent HASH is used extensively in distributed file-system. The feature of the method: it will only impact the neighboring node when the physical node is removed or added. The behavior of the method is to migrate data to the neighboring node, and will not impact the accessing of the data by the HASH addressing.

Figure-1-1: Node Initiation Figure-1-2: Remove Node Figure-1-3: Add Node

But the consistent HASH method cannot guarantee exactly the balance of the data. Especially, it will lead to the unbalance after removing node or adding node. For Example: when scale out; the newly added node will only share the workload of the neighboring node. The same situation after removing node from the system.

Core Idea (1) The number of virtual nodes is defined for the system initiation. There are multiple layers of virtual nodes based the number of the physical nodes in the running process.

When creating the virtual nodes, there is only one layer in the HASH circle. In the process of the system running, the number of layer will be changed based on the number of the physical nodes.

(2) This method can create virtual node by weight of physical node.

The weight W here is the flag to show percentage of the physicalnode in the whole distribute system. For example, the disk percentage can be set as weight, and the capacity of CPU can be set as weight. There are several methods to calculate the weight, for example, the weight of virtualnode can be got by dividing the disk size S of physical into the size M of unit of virtual node.

(3) The improved consistent HASH system can adapt automatically to re-balance.

The improved system can also adapt itself to re-balance if the system falls into unbalance status. The adapting automatically is the HASH system can monitor the balance status, if not balance, it will recalculate the virtual nodes, migrate the data and control the migration.

Advantages:

Decrease the probability of snow slide, make the system more balanced.

1



Page 02 of 8

The data will be distributed to be more balanced, the utilization will be higher.

The availability will be higher with dynamical adaption.

(1) Creating the virtual nodes initially


The basic method for the creating initially: make sure that the size of capacity of every virtual node is the same. HASH has the feature of deploying balance. The unit capacity will get the similar size of the data to be written based the method of HASH. If there is huge of data being written into virtual nodes, the corresponding weight virtual node makes sure that every physical disk has the corresponding size the data.

The number of the virtual nodes for every physical disk depends on the weight W. The weight W also depends on weight factor M. The unit value of the capacity is the weight factor M. The weight factor M can be got by other method; the unit value of the capacity is only one of the methods. Weight W=S/M. The value S is the...