Browse Prior Art Database

Use de-duplication backup data to detect sensitive data

IP.com Disclosure Number: IPCOM000241157D
Publication Date: 2015-Mar-31
Document File: 3 page(s) / 79K

Publishing Venue

The IP.com Prior Art Database

Abstract

In our invention, the deduplication data on backup server is used to detect sensitive data on client machines. The backup server scans the deduplication chunk IDs, and compare with a senstive data library to detect sensitive data. This solution has the advantage of: 1. No need to install extra software and maintain update of sensitive data library on every client machine. 2. Utilize the existing client deduplication data on backup server for data analysis. No need to collect extra data from client machines. 3. More group patterns (such as virus file spread rate in organization) can be detected when analyzing all clients’ data from a central server. 4. It’s better fit for cloud-based environment.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 01 of 3

Use de

Use de-

--duplication backup data to detect sensitive data

duplication backup data to detect sensitive data

Traditional method to detect sensitive data (such as virus files) is that: a specific software is installed on every client machine, and a sensitive data library (such as virus library) is distributed to every client machine, the software scans all the files and compare with the data library to detect sensitive data periodically.

This traditional method is hard to maintain because every machine needs to install the specific software and update the latest data library frequently. It does not fit for the cloud-based environment nowadays.

Many organizations enable deduplication in data backups. Deduplication provides a method to remove redundant data during a backup operation. The most popular deduplication method is to divide a file into sub-files (aka "chunks") by using variable-length-block fingerprinting. Each chunk is assigned with an identification string. The key advantage is that, when a small portion of file is modified, only a portion of chunks needs to be sent and kept in server, and the other unaffected chunks from previous backup can still be deduplicated.

Reference:

http://en.wikipedia.org/wiki/Fingerprint_(computing)

http://en.wikipedia.org/wiki/Data_deduplication

In our disclosure, the deduplication data on backupserver can be used to detect sensitive data on client machines. The backup server scans the chunk IDs to detect sensitive data. This solution has the advantage of: 1. No need to install extra software and maintain update of sensitive data library on every client machine.

2. Utilize the existing client deduplication data on backup server for data analysis. No need to collect extra data from client machines.

3. More group patterns (such as virus file spread rate in organization) can be detected when analyzing all clients' data from a central server.

4. It's better fit for cloud-based e...