Publication Date: 2014-Sep-23
We propose a technique which can analyze recent activities by the user on different applications on the internet for example –emails, browsers, downloads, wiki-activity, social updates, blogs, forums etc. to determine the files in the storage which may no longer required and can be archived/deleted/un-archived. The proposed solution alerts a user about the probability of archiving/deletion of a file, if the internet activities in the recent past (weight-age decaying over time) are not semantically related to the contents of the file.

The growing need to manage the information effectively in a computer system has always been a challenge.

End users will continue to add information in the form of files. These files can take many types of forms, like media files, text files, image files etc.

Because of the ability to create various forms of data (video, audio, images, notes, tweets, forum update, etc.) the size of the data grows exponentially and makes its management hard.

Some problems:

- Users keep on acquiring new discs or cloud storage to store their files.

- There may be files in discs that are have not seen for over a decade.

- Users sometimes don't even remember the contents of the storage space

The problem is to determine the files and/or directories which needs to be archived, deleted or un-archived based on various criteria including that of users present social activity and their semantic content. The solution monitors the user actions on the files/directories on computer system and combines this with semantic contents in the file. This information is then used for creating an index which helps in identifying the probability of archiving/deletion of a file.

Below are few parameters which can be monitored and used for probability computation -

Semantic content in the files like: Entities, topics, keywords, phrases, hyperlinks, abbreviations etc.

File access patterns on local computer system:
- Frequency at which a particular kinds of files being accessed by the user.

- Frequency at which user has been deleting a particular kind of files.

- Last used/modified date for a particular kinds of files.

- Rate at which particular types of files have been accessed/open/edited manually in past.

- Rate at which user have been adding a particular...