Browse Prior Art Database

Stochastic Identification of Duplicate Computer Files

IP.com Disclosure Number: IPCOM000019333D
Publication Date: 2003-Sep-11

Publishing Venue

The IP.com Prior Art Database

Abstract

This invention inserts a stochastic filtering procedure before any attempt to compare actual file contents, by calculating K-bit checksums for each of the candidate files and discarding files having unique checksums from further consideration as potential duplicate files. It performs the comparisons of actual file contents only between files having identical checksums, reducing the time required to confirm the identification of actual duplicates.