A method for reducing information leakage via cross-user deduplication mechanisms
Publication Date: 2010-Sep-14
The IP.com Prior Art Database
This invention proposes a simple mechanism that enables cross-user deduplication while greatly reducing the risk of data leakage. More specifically, the proposed method is a mechanism stating rules by which deduplication is sometimes artificially turned off.
The term data deduplication refers to techniques that store only a single copy of redundant data, and provide links to that copy instead of storing other actual copies of this data. By storing and transmitting only a single copy of duplicate data, deduplication offers savings of both disk space and network bandwidth (data does not need to be transferred if it is already there). In addition, it offers secondary cost savings in power and cooling achieved by reducing the number of disk spindles . Deduplication can be performed at various granularities (e.g., at the file level or at the block level), and this invention is relevant to all of these.
The invention is relevant in deduplication scenarios with the following two features:
· Cross-user deduplication. There is a central storage system or service that serves multiple users (or clients). The deduplication is performed also across different users. I.e., each file or block is compared to the data of other users, and is deduped if an identical copy is already available at the server/system/service. This approach is popular since it saves storage and bandwidth not only when a single user has multiple copies of the same data, but also when different users store copies of the data (a prominent scenario that generates great savings of storage and bandwidth).
· Source-based deduplication. That is, deduplication is performed at the client side, before actually sending the whole data over to the storage. This version of deduplication saves bandwidth and is therefore commonly used. The implication of applying this approach is that the client can observe
whether a certain file or block was deduplicated or not.
This can be done by either examining the amount of data transferred over the network, or by observing the log of the storage software, if that software provides this type of report.
we demonstrated how deduplication in cloud storage services can be
used as a side-channel which reveals information about the contents of files of other users. In a different scenario, deduplication can be used as a covert channel by
which malicious software can communicate with its command and control center,
regardless of any firewall settings at the attacked machine.
1. D. Russell, Data Deduplication Will Be Even Bigger in 2010, Gartner, February
2. D. Harnik, B. Pinkas,
Shulman-Peleg, Side Channels in Cloud Services, the
Case of Deduplication in Cloud Storage, IEEE Security and Privacy, special Issue on Cloud Computing - to appear.
This invention proposes a simple mechanism that enables cross-user deduplication while greatly reducing the risk of data leakage. More specifically, the proposed method is a mechanism stating rules by which deduplication is sometimes artificially turned off. In the paper ,
we quantify the guarantees of this simple
practice. This gives clients a guarantee that adding their data to the cloud has a very limited effect on what an adversary may lear...