Browse Prior Art Database

De-Duplication by Comparison of Slices

IP.com Disclosure Number: IPCOM000248632D
Publication Date: 2016-Dec-22
Document File: 1 page(s) / 19K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a process to eliminate the number of slices that a data store (ds) must store by reusing existing slices for new objects.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 86% of the total text.

1

De-Duplication by Comparison of Slices

The data store (ds) unit stores data in slices. A large data source name (DSN) memory, can store a large number of slices.

The novel contribution is a process to eliminate the number of slices that a unit must store by reusing existing slices for new objects.

The general algorithm for writes follows: 1. The ds unit slices data into slices 2. The ds unit gives each slice an entry in a data structure (e.g., a dispersed index)

using a deterministic, continuous function based on the data content of the slice 3. For each slice it generates, the ds unit compares the slice's name to the existing

slices using some data structure, looking for slices that are "close", as defined by a deterministic distance function

A. If the ds unit finds slices sufficiently close in content as determined by the deterministic distance function, it:

i. creates a "diff" between the first similar slice and the second similar slice

ii. stores that diff with a pointer to the first similar slice iii. discards the second similar slice

B. If the ds unit does not find a similar enough slice, the ds unit stores its second similar slice per current processes

Upon reads of the second similar slice, the diff and the first similar slice are used to determine the content of the second similar slice and return it to the requester. When the DS unit wants to delete an object that has a diff that references slices of this object, the dsnet creates and stores a new slice generated...