Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Optimizing Rolling Hash computation using SIMD Vector registers

IP.com Disclosure Number: IPCOM000226555D
Original Publication Date: 2013-Apr-16
Included in the Prior Art Database: 2013-Apr-16
Document File: 3 page(s) / 264K

Publishing Venue

Linux Defenders

Related People

Moinak Ghosh: AUTHOR

Abstract

Described is a technique for speeding up the rolling hash computation used in content-aware chunking for data deduplication via SIMD vector instructions. This technique can be conditionally used in certain environments depending on the nature of the rolling hash and the CPU capabilities available. A large vector register is used to hold the sliding window bytes and specialized vector instructions are used to shift the register bytes and add and remove bytes in the register. This simulates the behavior of a sliding window entirely in a CPU register.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 3

Optimizing Rolling Hash computation using SIMD Vector registers

    Described is a technique for speeding up the rolling hash computation used in content-aware chunking for data deduplication. This technique can be conditionally used in certain environments depending on the nature of the rolling hash and the CPU capabilities available. Typically rolling hashes computed for variable chunk data deduplication use a sliding window that keeps track of the bytes in the current hash window. Newest bytes are added to the window while the oldest bytes are popped out. Both the newest byte and the oldest byte values are used in the rolling hash computation.

There are two typical ways of keeping track of the bytes:


1. The simplest is to use two indexes to have a logical sliding window:

Index1 = current buffer scan position Index2 = Index1 - window-size

Where Index1 starts from buffer start + window-size position after initializing the rolling hash with the first window-size bytes. Now byte at Index1 and byte at Index2 are using in updating the rolling hash. Then Index1 is updated and the process repeated.

2. Some implementations maintain a separate window-size buffer and keep a window-index that circles thru the buffer round and round. In addition a buffer-index is also present that scans through the given data buffer.

The byte at the current window-index position and the byte at the current buffer-index position are then used to update the rolling hash.

    Both these approaches have performance limiting characteristics. In the first approach we are reading every memory location twice. First read happens when the leading edge of the sliding window reaches the byte and the second read happens when the trailing edge of the window reaches it. It is not a huge issue on modern processors which are endowed with caches. Unless we are using an unusual window size the window data will already be in the L1 cache. However in byte by byte scanning it can happen that the trailing edge byte is from the previous cacheline which may have to be fetched. The second approach requires a memory write for every byte being accessed. Memory writes even to the same locations repeatedly are expensive so this is worse than the previous appr...