Browse Prior Art Database

Finding Similar Files from a Backup Image without Excess Metadata and Selecting Search Metadata during Backup

IP.com Disclosure Number: IPCOM000228751D
Publication Date: 2013-Jul-03
Document File: 5 page(s) / 288K

Publishing Venue

The IP.com Prior Art Database

Related People

Weibao Wu: INVENTOR [+3]

Abstract

This publication describes a method to find similar files from a backup image without having to restore the backup image to a temporary location and without storing excess amounts of metadata for the search. This method may find similar files having more than 80% of the same content. This publication also introduces a more efficient way to select search metadata during the backup stage. It uses simple algorithms to select source data blocks and then calculates fingerprints for the selected data blocks

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 35% of the total text.

Page 01 of 5

Finding Similar Files from a Backup Image without Excess Metadata and Selecting Search Metadata during

Backup

  Weibao Wu Karl Li Shuangmin Zhang

Symantec Corporation

Abstract

This publication describes a method to find similar files from a backup image without having to restore the backup image to a temporary location and without storing excess amounts of metadata for the search. This method may find similar files having more than 80% of the same content. This publication also introduces a more efficient way to select search metadata during the backup stage. It uses simple algorithms to select source data blocks and then calculates fingerprints for the selected data blocks Copyright © 2013 Symantec Corporation. All rights reserved.

1


Page 02 of 5

Copyright © 2013 Symantec Corporation. All rights reserved. Symantec and the Symantec Logo are trademarks or registered trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners. For a full list of Symantec trademarks, please visit http://www.symantec.com/about/profile/policies/trademarks/currentlist.jsp

Any Symantec products described in this document are distributed under licenses restricting their use, copying, distribution, and decompilation/reverse engineering. No part of this document may be reproduced in any form by any means without prior written authorization of Symantec Corporation and its licensors, if any.

THE DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. SYMANTEC CORPORATION SHALL NOT BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES IN CONNECTION WITH THE FURNISHING, PERFORMANCE, OR USE OF THIS DOCUMENTATION. THE INFORMATION CONTAINED IN THIS DOCUMENTATION IS SUBJECT TO CHANGE WITHOUT NOTICE.

Symantec Corporation 350 Ellis Street Mountain View, CA 94043 United States

http://www.symantec.com

Copyright © 2013 Symantec Corporation. All rights reserved. 2


Page 03 of 5

Finding Similar Files from a Backup Image without Excess Metadata and Selecting Search Metadata during Backup

Problem Statement

Often, a document that has been backed up needs to be modified. After modifying the production version of the document, the backup copies may need to also be modified. But it is sometimes unclear which backup image includes this document. Furthermore, the backup copies are not exactly the same in byte content as the production document.

In addition, when using fingerprints as metadata to search similar files from a backup image, a fingerprint set may be generated for each file. The general way to generate fingerprint set is to calculate a fingerprint set for each data block of the file and then select some fingerprints as the resulting fingerprint set....