Browse Prior Art Database

Method and System for Temporal Query Log Profiling for Web Search Ranking

IP.com Disclosure Number: IPCOM000198741D
Publication Date: 2010-Aug-13
Document File: 6 page(s) / 81K

Publishing Venue

The IP.com Prior Art Database

Related People

Zhaohui Zheng: INVENTOR [+3]

Abstract

A method and system is disclosed for ranking search results by profiling temporal query log. The profiling of temporal query log is performed based on fundamental properties of a temporal behavior of low-quality hosts and spam-prone queries in search logs.

This text was extracted from a Microsoft Word document.
This is the abbreviated version, containing approximately 23% of the total text.

Method and System for Temporal Query Log Profiling for Web Search Ranking

Abstract

A method and system is disclosed for ranking search results by profiling temporal query log.  The profiling of temporal query log is performed based on fundamental properties of a temporal behavior of low-quality hosts and spam-prone queries in search logs.

Description

Disclosed is a method and system for ranking search results by profiling temporal query log.  In order to perform profiling of temporal query log, fundamental properties of temporal behavior of low-quality hosts and spam-prone queries in search logs are identified.  The identified fundamental properties are modeled as quantifiable features.  In particular, the concepts of host churn, a measure of changes in host visibility for user queries, query volatility, a measure of semantic instability of query results, and methods for construction of temporal profiles from search query logs are introduced to estimate the fundamental properties.

The method and system disclosed herein introduces a concept of host churn.  To quantify the churn four key metrics are used.  The four key metrics are, a) the number of queries a host appears in (nQ), (b) the number of impressions for a host (nI), (c) the number of referrals/clicks from search engines (nClk), and (d) the average position of the host in the queries they appear (pos).  For each of the metrics, normal hosts show an organic controlled pattern of growth or decay, as opposed to low quality hosts. 

The host churn is quantified for a host profile .  The host profiles are available across each of the n contiguous time-slices.  Each is a m×4 matrix, with an entry representing the value of a host j on property k, at time-slice i.  For any specific host, four temporal attributes ( COMMENTS " $nQ, nI,

nClk, nI \cdot pos$" nQ, nI, nClk, nIpos) are computed.  Thereafter, for any host i, across a temporal attribute j, the host churn is computed as a sum of values of the churn metric ϕ, computed on n−1 adjacent pairs of time slices as follows:  COMMENTS " \begin{align}

  \phi(H_{ij})=\sum_{k=1}^{n-1} \varphi(H^{k}_{ij},H^{k+1}_{ij})

\end{align}"

Ranking score for a search result is a function of the churn metric ϕ.  To quantify the host churn, two candidate metrics are used.  One of the two candidate metrics is a logarithmic ratio across two time-slices:  COMMENTS " \begin{align}

  \varphi(H^{m}_{ij},H^n_{ij}) = \log\frac{ H^{m}_{ij}}{ H^{n}_{ij}}

\end{align}"

Further, second candidate metric is used to incorporate the size of the host.  The second candidate metric is used to compare a temporal property of a host across two time-slices.  The churn, using the second candidate metric, for a host i, on temporal attribute j, across two time-slices m, and n, is computed as follows:  COMMENTS " \begin{align}

  \varphi(H^{m}_{ij},H^n_{ij}) = 2 \left(H^{m}_{ij}\log\frac{H^{m}_{ij}}{E_m} +

H^n_{ij}\log\frac{H^n_{ij}}{E_n}\right)

\end{align}"

whe...