Browse Prior Art Database

Proximity Evaluation Method for Many Search Terms in Full Text Search Engine

IP.com Disclosure Number: IPCOM000016249D
Original Publication Date: 2002-Oct-10
Included in the Prior Art Database: 2003-Jun-21
Document File: 2 page(s) / 56K

Publishing Venue

IBM

Abstract

Abstract There are many search engines which can rank the search result considering the proximity of search terms, and that functionality is called Proximity Ranking. In these Proximity Ranking, the proximity among all search terms are calculated from the distance between each pair of terms, therefore the proximity score tends to be less accurate in case the number of search terms is more than two. This invention provides the effective method to evaluate the proximity of search terms in case of more-than-two search terms. Details "Common Technologies and their Problems"

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Proximity Evaluation Method for Many Search Terms in Full Text Search Engine

Abstract

There are many search engines which can rank the search result considering the proximity of search terms, and that functionality is called Proximity Ranking. In these Proximity Ranking, the proximity among all search terms are calculated from the distance between each pair of terms, therefore the proximity score tends to be less accurate in case the number of search terms is more than two. This invention provides the effective method to evaluate the proximity of search terms in case of more-than-two search terms.

Details

"Common Technologies and their Problems"
1. Evaluation of proximity is less accurate when search terms are more than two.

In common Proximity Ranking, more-than-two terms can be specified as search terms, but the proximity among these terms is calculated from the distance between each pair of terms. Therefore the proximity score tends to be less accurate in case the number of search terms is more than two. For example, when you specify 'Tokyo', 'International' and 'Airport' as search terms, and there are two documents including either of the following sentence,

(i) The meeting on new international airport establishment held at Tokyo International Conference Center.
(ii) Tokyo (Narita) International Airport will be completed next year. a document which includes (i) is ranked higher than the one with (ii) in the result ranking, because the distance between 'Tokyo' and 'International' is minimum and so is the distance between 'international' and 'airport' in sentence (i), while the distance between 'Tokyo' and 'International' is not minimum.
2. A document must include all search terms to be hit in case of many search terms.

The more search terms are, the fewer the number of hit document becomes. Consequently a document which includes almost all search terms, whose content is very close to the user's target, is neither hit nor ranked in the result ranking.
3. The order sensitivity of search terms is not configurable.

The order of search terms should be considered in some cases, but not in other cases. It should be configurable.
4. The mixing ratio of proximity factor and other ranking factors is not configurable.

Other ranking factors, such as frequency of terms in a document, is very important in some cases, but not so in other cases. It should be configurable.

"How to Resolve the Problems"

To enhance the...