Browse Prior Art Database

Container Layout and Corresponding Reference Management for Dedupe Storage

IP.com Disclosure Number: IPCOM000247128D
Publication Date: 2016-Aug-08
Document File: 4 page(s) / 67K

Publishing Venue

The IP.com Prior Art Database

Related People

Cheng Shan: INVENTOR [+5]

Abstract

A container is composed of sub-containers and references are managed at sub-container level with bits for efficient sub-container reclamation w/o compromising either write or read performance.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 41% of the total text.

Page 01 of 4

Container Layout and Corresponding Reference Management for Dedupe Storage

Cheng Shan, Xianbo Zhang, Bin Liu,

Dongxu Sun, and Cheng Hai Zhu

Abstract

A container is composed of sub-containers and references are managed at sub-container level with bits for efficient sub-container reclamation w/o compromising either write or read performance.

Problem Statement

One challenging issue facing a deduplication system is how to manage references efficiently and reclaim deleted space efficiently. In MSDP, data segments are stored in containers, and RefDB is used to track the data container references. Each record in the RefDB represents how data segments of a container are being referenced by backup images. When references of a container drop to zero, the whole container can be reclaimed and its occupied space can be used to store segments from new backups; when deleted space size of a container reaches a preset threshold, compaction can be used to reclaim deleted space while keeping segments still referenced by some backups. The whole container reclamation costs much less than partial reclamation through compaction. The smaller a container is, the higher probability a container is reclaimed as a whole. However, when a container is getting smaller, two issues rise:


1. The number of containers increases and a file system may not be able to efficiently manage the increased container files. As we know, a given file system can only efficiently manage certain number of container files.


2. The container reference records increase inversely proportional to the decreased container size. At certain point, the reference updates would be prohibitive.

The proposal of this IDF addresses the two issues through creating sub-containers within a container and using bits to manage references for sub-containers.

1

© 2016 Veritas Technologies LLC. All rights reserved. Veritas and the Veritas Logo are trademarks or registered trademarks of Veritas Technologies LLC or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners.


Page 02 of 4

Publication Description

Let's start with an example to see how the proposal works. Suppose there are 6 segments (segment 1, segment 2, segment 3, segment 4, segment 5 and segment 6) and 3 backups (backup 1, backup 2, and backup 3), backup 1 references segments 1, 2 and 3, backup 2 references segment 2 and 4, backup 3 references segment 5 and 6. The back- reference list will be

segment 1 | backup 1

segment 2 | backup 1

segment 2 | backup 2

segment 3 | backup 1

segment 4 | backup 2

segment 5 | backup 3

segment 6 | backup 3

If all the 6 segments are stored in 1 container, then the reference between the container and backups will be
container 1 | backup 1

container 1 | backup 2

container 1 | backup 3

There are only 3 reference records.

If small container size is applied, say, 1 container includes only 2 segments (container 1 stores segment 1 and segment 2, container 2 stores segment 3...