Browse Prior Art Database

Stochastic Identification of Duplicate Computer Files

IP.com Disclosure Number: IPCOM000033691D
Publication Date: 2004-Dec-23

Publishing Venue

The IP.com Prior Art Database

Abstract

This invention inserts a stochastic filtering procedure before any attempt to compare actual file contents, by calculating K-bit checksums for each of the candidate files and discarding files having unique checksums from further consideration as potential duplicate files. It further performs the comparisons of actual file contents only between files having identical checksums, further reducing the time required to confirm the identification of actual duplicates.