Browse Prior Art Database

A method of fast comparing massive files between peer nodes in distributed system

IP.com Disclosure Number: IPCOM000237007D
Publication Date: 2014-May-27
Document File: 9 page(s) / 162K

Publishing Venue

The IP.com Prior Art Database

Abstract

This disclosure is to divide massive files into blocks by hash algorithm and fast compare the check sum of blocks among peer nodes to identify distorted files in a distributed system.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 56% of the total text.

Page 01 of 9

A method of fast comparing massive files between peer nodes in distributed system

In a distributed system, we normally requires tofull scan the structure of target directory and one-by-one compare the check sum of each files to collocate the difference among peer nodes. The disadvantage of this method are: 1) The performance of full scanning is very bad while there are massive files under target directory. 2) It's hard to filter out unnecessary file scanning if we only want to compare a subset of target directory. i.e. by file creation date.

The core idea of this disclosure is divide and conquer massive files under target directory into different partitions or blocks through hash algorithm. By comparing the check sum of partitions or blocks, we easily identify all distorted files between peer nodes in distributed system.

The challenges of core idea:

- How to make blocks on peer nodes of a distributed system contain the same file for comparison?

- File create/update/delete only impact the block where it sits.

- File distribution imbalance under different directory that require different blocks number for different directory.

- File volume increased or decreased along with time that need dynamically control the block number.


1. Construction of Partition & Blocks

- Split target path into partition based pre-defined Rules.

- Divided files under partition into blocks by Hash Ring

1


Page 02 of 9

2. Partition Splitter & Deployment

2


Page 03 of 9

a) Partition Planning

-...