Stochastic Identification of Duplicate Computer Files
Publication Date: 2004-Dec-23
The IP.com Prior Art Database
This invention inserts a stochastic filtering procedure before any attempt to compare actual file contents, by calculating K-bit checksums for each of the candidate files and discarding files having unique checksums from further consideration as potential duplicate files. It further performs the comparisons of actual file contents only between files having identical checksums, further reducing the time required to confirm the identification of actual duplicates.