Browse Prior Art Database

Method and System for Providing Dynamic Canonical Collections across a Plurality of Sources

IP.com Disclosure Number: IPCOM000237322D
Publication Date: 2014-Jun-13
Document File: 3 page(s) / 105K

Publishing Venue

The IP.com Prior Art Database

Related People

Bhautik Joshi: INVENTOR [+3]

Abstract

Disclosed is a method and system for providing dynamic canonical collections across a plurality of sources by clustering near duplicate media across different devices and social media sites sources into one coherent collection. The method and system assists a user in organizing and managing a collection of media by a similarity metric.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 39% of the total text.

Method and System for Providing Dynamic Canonical Collections across a Plurality of Sources

Abstract

Disclosed is a method and system for providing dynamic canonical collections across a plurality of sources by clustering near duplicate media across different devices and social media sites sources into one coherent collection.  The method and system assists a user in organizing and managing a collection of media by a similarity metric.

Description

Disclosed is a method and system for providing dynamic canonical collections across a plurality of sources by clustering near duplicate media across different devices and social media sites sources into one coherent collection.  The method and system assists a user in organizing and managing a collection of media by a similarity metric.  The collection of media is able to withstand undergoing transformations when moving between storage and publishing mechanisms as illustrated in the figure.

Figure

In an embodiment, the method and system is used to index an entire corpus of a canonical image repository.  The method and system allows for retrieval of similar images from the corpus given a source image.  The method and system is able to detect when an image is copied and re-uploaded by any person other than the image owner .  Thereafter, the image is flagged by abuse (law enforcement) team for automatic detection of the flagged image.  Additionally, the user trying to upload denied images using a new account is tracked and flagged for abuse.

The method and system includes media hashing function imageHash, which generates a reduced dimensionality representation of a piece of media, i.  The imageHash is given by the function h as h=imageHash (i), where i denotes any type of media that is hashed to a feature vector.  Given a set of media, s= {i0, i1... in}, a media hashing function is applied to each member in the set to generate a set of tuples of hashes and images given by s_hashes = {(i0, h0), (i1, h1)... (in, hn)}.  A given pair of hashes - hi and hk are compared using a comparison function, distanceFunc (hi, hk).  The comparison function uses, but is not limited to, a hamming distance between hash keys to represent the similarity of the media represented by the hash keys.

The s_hashes and distanceFunc are used to build a Vantage Point (VP) tree and an algorithm which uses the distanceFunc to index s_hashes.  In a scenario,a distributed VP-tree outperforms a single large VP-tree for searching across huge sets of indexed media if parallelism is correctly orchestrated.  The individual VP-trees are implemented for arbitrary subsets of users for more accurate and speedy user-specific results.  The system is also used to generate individual media hash trees used for different purposes, such as a specific tree for detecting abuse and another tree for de-duplication on upload.  A search function, vp_search,  given the VP-tree vp and an image hash h returns images similar to h along with the s...