Browse Prior Art Database

Stochastic Identification of Duplicate Computer Files Disclosure Number: IPCOM000019333D
Publication Date: 2003-Sep-11

Publishing Venue

The Prior Art Database


This invention inserts a stochastic filtering procedure before any attempt to compare actual file contents, by calculating K-bit checksums for each of the candidate files and discarding files having unique checksums from further consideration as potential duplicate files. It performs the comparisons of actual file contents only between files having identical checksums, reducing the time required to confirm the identification of actual duplicates.