Browse Prior Art Database

Ranking corpus documents based on regional affinity

IP.com Disclosure Number: IPCOM000237026D
Publication Date: 2014-May-27
Document File: 3 page(s) / 28K

Publishing Venue

The IP.com Prior Art Database

Abstract

A method for ranking corpus documents based on regional affinity is disclosed.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 39% of the total text.

Page 01 of 3

Ranking corpus documents based on regional affinity

Disclosed is a method for ranking corpus documents based on regional affinity.

When performing a search for documents related to a certain region, it is often hard to find the most inclusive documents for that region. In order to capture more local stories related to the region, its important to know the appropriate local papers to search.

For some news sites (even those that are local) however, searching for news around a certain topic may return results at an international or national level. When limiting that search by using the region as a key word, relevant news from neighboring local regions or counties will often be omitted. For example, when looking for news around taxes

around the Chicago area and a source like the Chicago Tribune

® produces the following

Simply searching the keyword 'tax' on the Chicago Tribune site brings up any news story with the word "tax" ranked by relevance. Since there are so many relevant stories, most results appearing in the top 10 are from that day. Say one of the results is about a tax increase in Lombard (a suburb of Chicago) that was released the day before. This is an important Chicago area tax story to capture, but if the search had been performed the week or even the month after, this result would not appear anywhere near the top. Since the search is specifically for Chicago region taxes, doing a keyword search on "chicago tax" to get more specific results would result in the Lombard tax increase story not breaking the top 100 results.

In order to get this news story a week or later than it was released, one would either need to search the site for news items using each suburb with a keyword "tax" or by searching a more local paper. The first option is extremely time intensive to capture the news when looking at only a couple local news sites and the second option requires a lot of time to search for those local sources and then judge their credibility and value.

The disclosed solution is to look at the query of "Chicago tax" and do natural language processing (NLP) analysis that would identify the appropriate regions relative to the query and return the best sources relative to that location and topic, as well as to rank those sources in terms of its relevancy and importance to the region.

The following steps are utilized:

1) When presented with a search query, perform NLP analysis on the query to determine if this query is region oriented.

2) If yes, determine the location and related regions

2a) Rank documents and sources based on the regional affinity in the context of the query

The disclosed method may greatly improve search quality by presenting highly relevant results first and saving the user a lot of time by identifying for them and ranking the best regional, local, and topical news sources.

1

result:


Page 02 of 3

Example: User has a search query about toxicity affecting Chicago water and sewer bonds. The user wants to pull o...