A method to accelerate calurate/compare Hash value in NTFS De-dup enabled disk
Publication Date: 2013-Nov-15
The IP.com Prior Art Database
This disclosure is aim to speed up the hash calculation under copy senario. Gernerally, the comman method to deal with duplication files on offline De-dup area is to divided each files into small chrunks and then calcuate the hask key for each chunks and then compare the hash kay and deleted the duplication files which surely have the same hash value. This diclosure seek for a different way for hash calculation for thoes duplication files. It detected and kept the copy relationships when duplication happens by keeping an index to mark the copy relationship. For the duplication files with the same index, the de-duplication program can save the effort on calcuate hash value for each chunk. otherwise it only need to caculate the source files' hash value and then directly copy the hash value for files which hold the same index. It can save the duplication calucation time for de-duplication program.
Page 01 of 6
A method to accelerate calurate /
/compare Hash value in NTFS De
compare Hash value in NTFS De -
-dup enabled disk
dup enabled disk
Currently the host side de -duplication feature significantly improve the efficiency of storage capacity usage . It saves storage capacity by delete the duplicate contents in de -dup enabled disks. There files are no longer stored as independent streams of data , but are replaced with points to data stored within a common chunk store , as shown as chart 1-1. All (1)-(6) actions were performed at an appointed time.
(1) Divide file A into data chunks per variable chunk size ;
(2) Calculate hash value for file A ;
(3) The same action for file B ;
(4) Compare hash value for each data chunks between file A and B ;
(5) Save metadata and map the right point .
(6) Delete data chunks which has the consistent hash value .
This disclosure is worked as an enhanced algorithm which will do some improvement on (1)-(4) under aspecial scenario such as Copy command happened and the like. As shown as Chart 1-2. It will speed up the hash value calculation and compare by leverage (1)-(4) into general operating time rather than a special busy computing time with a more de -dup efficiency method when Copy happened .
Chart 1-1: general de-dup method :
Page 02 of 6
Chart 1-2: improvement on this disclosure . General de-dup vs in this disclosure .
Page 03 of 6
Page 04 of 6
As stated in background, this idea is benefit from os filesystem itself to help saving hash value calculating and compare time . Imagine if a 1000 copy happened on a de-dup enabled disk,...