Browse Prior Art Database

A Method for Disambiguation based on Clickstream Connections in a Context

IP.com Disclosure Number: IPCOM000243439D
Publication Date: 2015-Sep-22
Document File: 6 page(s) / 148K

Publishing Venue

The IP.com Prior Art Database

Abstract

Wikification technology aims to find mentions in an article and link them to corresponding Wikipedia or other knowledge base documents. For each mention detected by the state-of-art tools, there is usually a couple of candidates. It is crucial to rank the relevant or correct candidates higher among all the candidates. We focus on this very task of linking a detected mention to its correct candidate wiki page, which is called disambiguation. The state-of-art tools usually generate and rank the candidates solely based on article properties such as page title similarity, previously attested references, Wikipedia article length, link graphs and textual context similarity. These mainstream tools overlook the availability of the ‘wisdom of the crowd’. Aggregated wiki clickstream data directly indicates user interests and preferences when browsing wiki pages and thus reflects the ‘wisdom of the crowd’. We use it to improve the ranking of the mention candidates.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 01 of 6

A Method for Disambiguation based on Clickstream Connections in a Context

Using the clickstream data generated by Wikipedia users to re-rank the mention detection candidates. The article's context and the wisdom of the crowd help the system to make better choices. For instance, we have an article with the following snippet:

'fire!' is detected as a mention and has the following candidates:

Shouting_fire_in_a_crowded_theater is not ranked high among all the candidates. However, wiki users often click between this pages and other detected mention pages as marked in blue boxes in the article.

Wikification technology aims to find mentions in an article and link them to corresponding Wikipedia or other knowledge base documents. For each mention detected by the state-of-art tools, there is usually a couple of candidates. It is crucial to rank the relevant or correct candidates higher among all the candidates. We focus on this very task of linking a detected mention to its correct candidate wiki page, which is called disambiguation.

1



Page 02 of 6

This indicates that it might be a good candidate. According to the wiki users' selection, we are able to create a better ranking of these candidates.

Our invention makes use of the click through data generated by knowledge base service users to indicate the relatedness of the concepts in an article to help the task of disambiguation in mention detection. It utilizes the wisdom of the crowd with novelty.

About click through data
It is the counts of (referrer, article) pairs aggregated from the HTTP request logs. The Wikipedia click through data released by Wikimedia Foundation is utilized here. It aggregated 4 billion HTTP requests Wikipedia received in Feb 2015, into 22 million (referrer, article) pairs
.The table 1 shows the top links that users click through on wiki page for 'China' and table 2 shows the sources of traffic the wiki page for 'China' gets.

Table 1 Outclicks from the 'China' wiki page

2



Page 03 of 6

Table 2 Inclicks to the 'China' wiki page

3



Page 04 of 6

   These statistics indicate user interests. Users apparently choose to click on some links on a certain page more than other links. For incoming internal traffic to a page, we are able to see that some pages drive more traffic to the current page. This provides a page relevance measurement.

Method logic

Existing t...