Browse Prior Art Database

Method and System for Identifying and Labeling Search Tasks via Query-based Hawkes Processes

IP.com Disclosure Number: IPCOM000239850D
Publication Date: 2014-Dec-05
Document File: 8 page(s) / 489K

Publishing Venue

The IP.com Prior Art Database

Related People

Hongbo Deng: INVENTOR [+4]

Abstract

A method and system is disclosed for identifying and labeling search tasks based on a probabilistic model that combines the Latent Dirichlet Allocation (LDA) model with Hawkes processes. In the model, queries are issued by one user sharing the same topic distribution, words in one query belonging to the same topic, while the query sequence of each user is modeled as a Hawkes process.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 21% of the total text.

Method and System for Identifying and Labeling Search Tasks via Query-based Hawkes Processes

Abstract

A method and system is disclosed for identifying and labeling search tasks based on a probabilistic model that combines the Latent Dirichlet Allocation (LDA) model with Hawkes processes.  In the model, queries are issued by one user sharing the same topic distribution, words in one query belonging to the same topic, while the query sequence of each user is modeled as a Hawkes process.

Description

Nowadays, search engines are important and indispensable web portals for people to pursue a wide range of searches in order to satisfy a variety of information needs.  Search task analysis is a very important research area for better understanding users’ search activities, wherein a search task consists of a sequence of queries that serve for same information need.  The user’s single session contains queries with multiple intents, or consists of seeking information on single or multiple topics.  Temporal submission patterns in query sequences carry valuable information for mining search tasks.  However, the temporal submission patterns are only used for either simply splitting sequence of queries into temporally-demarcated sessions or transforming the patterns as pairwise features among queries.  Moreover, different users engage in different search patterns indicating how likely the search tasks change within a certain time period for different users.  The search tasks are treated differently based on their search activities.

Disclosed is a method and system for identifying and labeling search tasks based on a probabilistic model that combines the Latent Dirichlet Allocation (LDA) model with Hawkes processes.  In the combined model, queries are issued by one user sharing a same topic distribution, words in one query belonging to a same topic, while a query sequence of each user is modeled as a Hawkes process.  The combined model is interpreted as a balance between a reasonable query clustering based on query co-occurrence and a rational identification of influence among queries.

In a scenario, M users are considered to issue M corresponding query sequences, and query sequence issued by mth user is marked as Tm = {tm, n, n=1, …, Nm}.  A word set of the n-th query by user m is denoted as Wm,n = {wm, n, 1… wm, n, Cm, n}.  The combined model predicts whether influence exists between each pair of queries and accordingly identifies the search tasks.  Additionally, the combined model labels each identified search task.

The method and system employs graphical models like LDA to cluster queries that co-occur in the same user query sequence into topics.  Each query is assigned to one of the topic based on query-topic membership to identify the search tasks and label the query.  The query sequences are segmented into the search tasks to determine query co-occurrence instead of word co-occurrence wherein words in one query belong to the sam...