Browse Prior Art Database

Method for Personalizing Search Results with Domain Relevance

IP.com Disclosure Number: IPCOM000124208D
Original Publication Date: 2005-Apr-12
Included in the Prior Art Database: 2005-Apr-12
Document File: 4 page(s) / 189K

Publishing Venue

IBM

Abstract

Disclosed is a method for personalizing search results by automatic inference of user's domain preference from his browse history and bookmark listing. People tend to have affinity for certain domains and if they get search results belonging to those domains, those result pages are opened before others. This domain preference is calculated as a domain relevancy metric and is then used to increase the page rank of returned results based on the domain those pages belong to. The domain releavancy metric is a weighted combination of a bookmark weight and a history weight for a domain. A bookmark profiler is used parse the bookmark listing of the user and assign a weight to each domain depending on the last visit of the user to a bookmarked URL beloging to that domain and the average time spent on a URL of that domain. A history profiler is used to parse the browsing history of the user and assign a weight to each domain depending on the number of times a URL belonging to that domain was visited by the user and the average time spent on them. A domain hierarchy tree (T(d))of suitable granularity level is created with the bookmarked and browsed URLs with each node of the tree associated with the releavancy metric. When the user searches for a query, the results returned by the search engine are used to create a DNS hierarchy tree (T(s)). Intersection between T(d) and T(s) is used to determine the domain ranking for each of the search results and the page rank is augmented with the domain ranking to re-order the search results. The domain relevancy metric can be used along with any other personalization metric (such as web-page categorization to incorporate user?s search context) to re-rank the search results.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 30% of the total text.

Page 1 of 4

THIS COPY WAS MADE FROM AN INTERNAL IBM DOCUMENT AND NOT FROM THE PUBLISHED BOOK

IN820050021 Srilakshmi N Chakravarthy/India/IBM@IBMIN Vibha S Sinha, Anuradha Bhamidipaty

Method for Personalizing Search Results with Domain Relevance

Background

Most search engines today return results based on a global page rank which rates the individual importance of a page based on its content and link structure. The next generation of search engines is looking into ways to personalize results based on user preferences and context. These preferences can be explicitly defined or automatically inferred. "Personalized Google" asks user to specify categories of interest which are then used to augment the page rank. There is also considerable work going on in Google, Eurekster etc. to determine user's long and short term context implicitly using his browse history and then personalizing search results accordingly. As a middle ground, people have also proposed using pre-defined user information such as age, geographic location, profile-similarity with other people etc. for re-ranking search results.

Another metric that would be useful to personalize searches is domain preference. People tend to have affinity for certain domains and if they get search results belonging to those domains, those result pages are opened before others. Some search engines have tried to incorporate domain preference in user profile. For example, as an advanced option, Jeeves.com allows user to specify domain or web-site preference. The search results then contain only pages from the specified domain. Atkas et al in* have presented a modified page-rank algorithm which ranks pages based on how they adhere to pre-defined user preference on geographic (uk, ac ...) and topical domains (gov, edu ...).

Description

A domain name or URL is simply an internet address with the following generalized format -

Protocol:// host name// page (html, php, jsp ....). Host name can be further generalized to be of the following format - a.b.c ... example www.yahoo.com , www.google.com , www.stanford.edu , cs.stanford.edu . These host names can be parsed to create a tree like hierarchy (example shown below)

Root

edu

yahoo stanford

com

google

cs

We can try to infer a user's hostname/ domain preference up to any level of granularity as determined by the tree hierarchy. So if we want to limit ourselves to only the top level node we can rank user's relative interest in top level domains i.e. edu or com as from above example. If we want to go lower, we can infer his interest in google.com, yahoo.com or stanford.edu.

We propose that we increase the page rank of pages belonging to domains the user has browsed before. This way pages that have a good enough page rank and belong to a user preferred domain tend to come up higher in the search results than pages belonging to unseen domains. Below we present some examples where personalizing search results based on domain preference would be useful -

Root

1

Ro...