Browse Prior Art Database

Efficient Data Fetching Over Low Bandwidth Network/Cloud

IP.com Disclosure Number: IPCOM000220500D
Publication Date: 2012-Aug-02
Document File: 5 page(s) / 60K

Publishing Venue

The IP.com Prior Art Database

Abstract

With time, information generated is growing at fast pace. In many big data centers, data are kept at one or multiple site, possibly having disaster recovery site, but data can be accessed from anywhere over WAN. Internet is common example for this. Same is true many customers who opted for their data being in cloud. To avoid performance issue over WAN, many edge devices on client site implement some kind of caching techniques.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 28% of the total text.

Page 01 of 5

Efficient Data Fetching Over Low Bandwidth Network /Cloud

With time, information generated is growing at fast pace. In many big data centers, data are kept at one or multiple site, possibly having disaster recovery site, but data can be accessed from anywhere over WAN. Internet is common example for this. Same is true many customers who opted for their data being in cloud. To avoid performance issue over WAN, many edge devices on client site implement some kind of caching techniques.

This invention is quite useful in customer environments where network bandwidth is a costly resource. One such use case is where customer have its entire data stored in cloud and need frequent access to its data at local site. By using this technology, the customer would save on cost while increasing performance. This can also be useful in WAN caching devices. A WAN caching device typically keeps a local copy of part/whole of a file stored on remote server. As application demands more data, the data is fetched from remote server. It is common for two or more files to have common data on a file system (for e.g. multi user environment etc). Both remote file server and local file server might store single copy of duplicate data through its deduplication engine but it is not good enough to save network bandwidth because neither site have knowledge about data on either site i.e. server have no idea what data i.e. cached by client and client have no idea about what data it is about to fetch and whether its duplicate of some data already in its local cache. By using this feature, WAN caching appliances can optimize on its data transfer significantly.

Prior Arts in the related field:


1. http://www.freepatentsonline.com/y2011/0125720.html


This above embodiment uses segment identifier (hash key) to reduce the data update flow over network to deduplicated system. This is different from our approach.

How it is different? We are defining pull model which means if a node wants data from another node, its sends requests and in response the target node replies with hash keys of fix size chunks.

The sender then finds its metadata for the hash key and then resends sparse requests for only the segments whose keys are not present.

Here its pull model as the receiver of data is demanding de-duplicated keys.

It has been proved that if 160 bit hash key of say 256k chunk matches then it's almost certain that segment will match. Now vendors do not bother to match segments if hash key matches.

With this logic, the idea is to maintain a hash key of any 256k chunk present in the system. Its just one time metadata dependency and hash key is not confined from 1 set of data ... as for example backup .it can be from any where.


2. http://www.freepatentsonline.com/y2010/0036887.html


This embodiment talks about transfer of deduplicated data between storage pools within a storage management System It is tracking deduplication information for each deduplicated data chunk in an index wit...