Browse Prior Art Database

A Machine Learned Query Forwarder to Improve Efficiency of Distributed Search Engines

IP.com Disclosure Number: IPCOM000233772D
Publication Date: 2013-Dec-19
Document File: 9 page(s) / 269K

Publishing Venue

The IP.com Prior Art Database

Related People

Xiao Bai: INVENTOR [+4]

Abstract

A machine-learned query forwarder is disclosed to improve efficiency of distributed search engines. A document replication technique is utilized for the query forwarder for improving a query locality with various replication budget distribution strategies. A machine learning approach is devised to decide query forwarding patterns, which achieves significant lower false positive ratio with little impact on search result quality. Further, three result caching strategies are used with the query forwarder, to reduce a number of forwarded queries and analyze their tradeoff in storage and network overhead. The combination of the techniques used with the query forwarder yields high search efficiency thereby rendering multi-site distributed web search engines an attractive alternative.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 14% of the total text.

A Machine Learned Query Forwarder to Improve Efficiency of Distributed Search Engines

Abstract

A machine-learned query forwarder is disclosed to improve efficiency of distributed search engines.  A document replication technique is utilized for the query forwarder for improving a query locality with various replication budget distribution strategies.  A machine learning approach is devised to decide query forwarding patterns, which achieves significant lower false positive ratio with little impact on search result quality.  Further, three result caching strategies are used with the query forwarder, to reduce a number of forwarded queries and analyze their tradeoff in storage and network overhead.  The combination of the techniques used with the query forwarder yields high search efficiency thereby rendering multi-site distributed web search engines an attractive alternative.

Description

Disclosed is a machine-learned query forwarder to improve efficiency of distributed search engines.

The query forwarder utilizes query locality, based on the fact that a fraction of queries get their best matching documents in an index of a site to which they are issued.  This saves the need for processing the queries on remote sites and reduces response latencies and query processing workloads.  Since, each search site only indexes a subset of documents, some queries are not able to obtain the best matching results from a local index.  A query forwarding strategy is thus necessary, every time a query is received at a local site, to determine which search sites may be indexing some best matching documents, so that the query can be forwarded to those sites.  In order to avoid unnecessary increases in query response time, it is important to forward queries only to sites which indexed the best matching results.  Additionally, forwarding queries increase the query response time due to the network latency among sites.  If the documents that are frequently fetched from remote sites to serve queries are replicated and indexed locally, more queries benefit from the faster query response time ensured by the query locality.  For this, correct documents to replicate in each site need to be determined in order to achieve high query locality with low storage overhead.  In addition, a result cache is useful to reduce the query response time and the query processing workload.

New strategies for document replication, query forwarding and multi-site result caching are evaluated and a first realistic simulation of the full stack of strategies in a multi-site search engine is conducted.  The document replication technique is utilized that relies on a per document utility to select the documents to replicate on each site thereby improving the query locality.

In addition, a machine learning approach is proposed for the query forwarding that exploits the tradeoff between result quality on one side and query response time and system workload on another side.  The query...