Browse Prior Art Database

Method and system for Performing Cross Language Information Retrieval by Query Selection

IP.com Disclosure Number: IPCOM000197949D
Publication Date: 2010-Jul-23
Document File: 3 page(s) / 23K

Publishing Venue

The IP.com Prior Art Database

Related People

Lei Shi: INVENTOR

Abstract

A method and system for performing Cross Language Information Retrieval (CLIR) by query selection is disclosed. The method and system involves identifying probability of performing CLIR for a search query.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 47% of the total text.

Method and system for Performing Cross Language Information Retrieval by Query Selection

Abstract

A method and system for performing Cross Language Information Retrieval (CLIR) by query selection is disclosed.  The method and system involves identifying probability of performing CLIR for a search query.

Description

Disclosed is a method and system for performing Cross Language Information Retrieval (CLIR) by query selection.  The method and system involves utilizing a statistical model to estimate probability of performing CLIR for a given query and translations of the query based on quality of the translations and search results.  Model parameters of the statistical model are estimated by labeled training examples.

The statistical model is utilized to compute the probability by utilizing equation 1:

                (1)

Where,

t is a translation which a query translator generates for a query q;

 is a probability that t is correct translation of q

is a probability that t is an incorrect translation of q, such that

                  (2)

Equation 1 is used for computing the probability  as a mixture of a probability of choosing to do a cross language search for the query q when translation t of q is correct and a probability of choosing to do a cross language search for the query q when translation t of q is incorrect.

In equation 1, an assumption is made that probability  is zero, therefore, it is assumed that the cross language search is not chosen if t is not a correct translation.  Even though sometimes imperfect translated queries can still yield relevant results, this assumption holds to a large extent.  Therefore, equation 1 may be rewritten as:

       (3)

Here the probability of t as the correct translation for q,  is estimated by a translation model as given in equation 4:

                  (4)      

Where,

m is the length of t (number of words in t);

l is the length of q;

 is the word at the jth location of t; and

 is the word at the ith location of q.

Further, an assumption is made that the query is translated word for word and the translation probability of the query is product of the translation probabilities of each query word.

In equation 3,  is the probability of performing cross language retrieval when the translation t is the correct translation of q.  Here, , such that r stands for performing cross language retrieval and  denotes not performing cross language retrieval.  This probability is estimated based on retrieval results of the translated query t and results of the original query q.

The method and system employs a maximum entropy (ME) model to estimate the conditional probability .  ME model estimates the probability distribution of a data by defining a collection of feature functions that describe certain attributes of the data.  The ME model is trained to fit empirical counts of the features functions on a training data and at the same time maxi...