Browse Prior Art Database

Method for inferring information source relevance based on historical query logs across multiple other information sources Disclosure Number: IPCOM000239758D
Publication Date: 2014-Dec-01
Document File: 2 page(s) / 68K

Publishing Venue

The Prior Art Database


A method is presented that can dynamically update a document collection by scoring new incoming documents against the collection of all historical user queries against the existing document collection. The rationale here is that a new potential document should be added if it had contributed to a lot of the queries against the information system. Conversely, it is possible to weed out documents that are no longer relevant by periodically comparing them against the historical query log. This method makes it possible to create systems that are able to maintain a dynamic collection of relevant documents, without specifying exact search parameters for those documents in advance.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 54% of the total text.

Page 01 of 2

Method for inferring information source relevance based on historical query logs across multiple other information sources

In many cases, a real world information system needs to pull information from a potentially large set of sources that each offer different types of information. Examples of public sources of information could be entire government databases or single wikipedia pages, with examples of private information sources coming from

Reuters or other third party information providers. Typically, these sources are then used as content in use-case specific information retrieval applications. The set of all sources is referred to that underlie such a system as an information collection ,

where information collection contains (links to) a large variety of data sources. Information collections are used by information systems to provide their users with relevant results.

    In one concrete sample of such an information system, one may need to index and watch a set of webpages around a single very specific topic, such as

world-wide regulations for medical compounds. Although one can initially construct a set of pages that may be relevant for this application, there is no guarantee that this set is complete, nor that these sources will remain relevant through the lifetime of the information system. In another usecase, a recruiting company might want to maintain a subset of relevant user profiles to monitor more specifically. This initial set cannot be decided in advance, as new profiles may appear over time and monitoring the entire set of profiles in the social network database is impossible.

    In practice all of the data sources in an information collection can update, meaning that their content no longer aligns with the need of the information system. A secondary problem is that new sources that may be relevant might appear online and not be added to the information collection. The problem in both these cases is that the specification of the sources that comprise an information collection is mostly a static list, often decided by the application developer at application construction time or, in the best case, by a system administrator at system runtime. Adding relevant new sources or removing mostly unused sources can be a tedious affair. If one can automatically rate the potential relevance of a source for information collection one can both remove sources that have become irrelevant (because they no longer provide accurate information) and add only relevant sources from a collection of potential sources. How this potential collection is computed is not relevant for this disclosure, but this may for example involve link analysis where sources that are linked from multiple sources already in our collection would be good potential candidates.

    An existing solution was found that rates a single datasource's relevance to a particular user query [1], but no solutions that co...