Browse Prior Art Database

Density function for query terms

IP.com Disclosure Number: IPCOM000014369D
Original Publication Date: 2000-Mar-01
Included in the Prior Art Database: 2003-Jun-19
Document File: 1 page(s) / 36K

Publishing Venue

IBM

Abstract

There are many different search engines which determine a set of documents which satisfy a specific query. In general, the words in the query are submitted to the search engine as a bag of words and the hit list is returned. Such a hit list needs to be rank ordered, with the most relevant documents at the top. Different scoring algorithms were suggested in the past.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 59% of the total text.

Page 1 of 1

Density function for query terms

    There are many different search engines which determine a set of documents which satisfy a specific query. In general, the words in the query are submitted to the search engine as a bag of words and the hit list is returned. Such a hit list needs to be rank ordered, with the most relevant documents at the top. Different scoring algorithms were suggested in the past.

In this paper, we propose a new scoring algorithm which in general will rank the most relevant documents or fragments of documents at the top. First of all, the elements in the hit list are not necessarily whole documents, but could be passages from documents. A passage consists of a predetermined (but user settable) number of sentences in the documents.

To score a passage or a document, any given scoring algorithm can be used as deemed rele- vant to a particular application. The individual words could have weights associated with them which are preset by the user (or another part of the system) or the frequency of occurrences of the terms could be used. However, with any scoring algorithm, several documents or passages may end up with the same score and a mechanism for resolving the ranking of equal scoring documents or passages has to be addressed.

We propose to use a density function to break the tie in scoring. For each element in the hit list its density function is computed which in its simplest form is the reciprocal of the number of terms between the first and...