Browse Prior Art Database

Method and System for Preventing Duplicate Files

IP.com Disclosure Number: IPCOM000185129D
Original Publication Date: 2009-Jul-13
Included in the Prior Art Database: 2009-Jul-13
Document File: 5 page(s) / 17K

Publishing Venue

Linux Defenders

Related People

John Cronin: AUTHOR [+2]

Abstract

The present invention is a Duplicate Detection Module within the operating system that identifies duplicate files or content on a computing machine. When a new file is added to the machine, the Duplicate Detection Module filters the sum of all files to return only files of the same file type and size range. The filename and metadata of the filtered files are then analyzed to identify duplicate files.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 45% of the total text.

Page 1 of 5

Contact Information

Publications@ipcg.com

ipCapital Group, Inc.
400 Cornerstone Drive, Suite 325 Williston, VT 05495
United States of America
(802) 872-3200

TITLE

Method and System for Preventing Duplicate Files

ABSTRACT

The present invention is a Duplicate Detection Module within the operating system that identifies duplicate files or content on a computing machine. When a new file is added to the machine, the Duplicate Detection Module filters the sum of all files to return only files of the same file type and size range. The filename and metadata of the filtered files are then analyzed to identify duplicate files.

1. BACKGROUND

Problem or Opportunity

File systems on personal computers can often become cluttered and bloated. Duplicate files may exist that waste precious disk space or confuse users. Some duplicate files may have different file names or be different versions of the same file. These duplicates are especially difficult to identify. A system is needed to efficiently and accurately identify duplicate files on a computing machine.

Background Publications

Previous publications have attempted to solve the problem of duplicate files on computing machines. However, none of the previous solutions have identified duplicate files using the analysis of metadata.

US Patent Number 7401080 describes an invention that identifies duplicate files in a storage system. A hash is calculated for every file in the storage system, and files with the same hash are identified as duplicate files. This method only identifies exact duplicates and does not use metadata to identify duplicates.

US Patent Application Number 20080244199 describes a "Computer system preventing storage of duplicate files." In this invention an "intrinsic value" is calculated for each file on a device. This intrinsic value is a hash function in the preferred embodiment. The

1

Page 2 of 5

intrinsic value for every file is compared and duplicates are identified as files with the same intrinsic value. This invention does not involve the use of metadata to identify duplicate files.

US Patent Number 7366718 describes a system for identifying duplicate web pages for increasing the effectiveness of search engines. Fingerprint values are devised for web pages and similar fingerprints are identified as duplicates. This invention does not relate to the detection of duplicate files on a single computing device.

2. SUMMARY OF INVENTION

Invention Summary

In the present invention, a Duplicate Detection Module manages the identification of duplicate files. Each time a new file is copied to the user machine the Duplicate Detection Module is activated.

In the first stage of duplicate detection the Duplicate Detection Module filters the entire File System for files of the same file type and size range as the newly copied file. The file type could refer to the file extension or a more general use of the term (e.g., audio, video, text document). The...