Browse Prior Art Database

Method and System for Providing Unsupervised Multipass Document Retrieval

IP.com Disclosure Number: IPCOM000242043D
Publication Date: 2015-Jun-16
Document File: 2 page(s) / 32K

Publishing Venue

The IP.com Prior Art Database

Related People

Sainath Vellal: INVENTOR [+2]

Abstract

A method and system is disclosed for retrieving one or more documents related to a query or a query document by using an unsupervised, set-based, multiple iteration approach, wherein each of the iterations deduces a next action based on a previous set of documents alone.

This text was extracted from a Microsoft Word document.
This is the abbreviated version, containing approximately 53% of the total text.

Method and System for Providing Unsupervised Multipass Document Retrieval

Abstract

A method and system is disclosed for retrieving one or more documents related to a query or a query document by using an unsupervised, set-based, multiple iteration approach, wherein each of the iterations deduces a next action based on a previous set of documents alone.

Description

Disclosed is a method and system for retrieving one or more documents related to a query or a query document by using an unsupervised, set-based, multiple iteration approach.  Each of the iterations either expands or narrows a document set according to content retrieved in a previous iteration.  The expansions are usually followed by contractions, which in turn are followed by expansions.  Thus, each of the iterations deduces a next action based on a previous set of documents alone.

In an embodiment, the method and system iterates on subsets and supersets of an ideal/final set of documents by expanding or contracting the scope of the original query.  In addition to the final set, the method and system also provides clusters of documents that comprise the final set.  The clusters of documents are identified during the iterations by applying the similarity function on the subsets and the supersets.  The clusters of documents are useful for identifying further refinements to an original query.

Consider a scenario where the method and system receives a query [stanford football].  The results of the query are upper bound and lower bound by identifying supersets and subsets of a final solution.  In accordance with the scenario, for the query [standford football], the method and system identifies a superset of documents that match either [stanford OR football] and the subset of [stanford AND football].  However, [stanford] and [football] may have synonyms or related terms/entities tha...