Browse Prior Art Database

Method for Reference Count Management in a Deduplicated Storage System

IP.com Disclosure Number: IPCOM000243111D
Publication Date: 2015-Sep-15
Document File: 6 page(s) / 66K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method to employ a reference count update table in a deduplicated storage system. This method operates within the framework of having a staging table system that supports the primary deduplication catalog table.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 28% of the total text.

Page 01 of 6

Method for Reference Count Management in a Deduplicated Storage System

In a deduplicated storage system, a data object is fingerprinted and broken into unique data extents. Because the deduplication catalog table can contain many references to the same data extent, a mechanism is needed to know when a particular data extent is no longer referenced and can be removed from the storage environment. In addition, there may be a need to recognize the commonness of a given data extent, such that it can be managed in a particular way. For instance, the user might want to store all highly referenced data extents in a retrieval cache because the likelihood of needing that data is high. However, the most critical reason for having a reference count is to know when the data extent is no longer needed and can be deleted from the storage device. Without having a reference count, the developer has to take action (e.g., use a Foreign Key) on the deduplication catalog or some other laborious process, which can have huge

performance impacts.

Having a reference count methodology also introduces potential performance problems such that the user must have a system in place that quickly and efficiently manages the references. In order to accurately maintain reference counts, multiple simultaneous operations must have exclusive access to the count; otherwise, the count can become invalid. An

invalid reference count can give rise to premature extent deletion, or an extent never being deleted, despite no longer being referenced in the system. It can also lead to improper management of the storage system if decisions are made based on the reference count.

One potential solution to the problem is to give each concurrent operation exclusive access to the data extent information row so that it can read, increment, and update the count. This solution imposes major performance and scale restrictions as the use must have exclusive access to the catalog table row for each data extent being updated. With potentially multiple updates coming in for a data extent, serializing access to deduplicated extent information data significantly impairs performance, as multiple operations must wait in queue to make updates to the database table, detracting from the intended purpose of storing data in the system. In current storage environments, high scale and performance is a standard, which means there is a premium on application thread concurrency.

The novel contribution is a method to employ a reference count update table. This method operates within the framework of having a staging table system that supports the primary deduplication catalog table.

Currently, the system of using a staging (volatile) type of support table can cause potential problems, including, but not limited to:


• Identifying how and when information should be moved from the reference count updates table to the main data

1


Page 02 of 6

  extent information table • Establishing a method to consistentl...