Browse Prior Art Database

A Cluster Search by Using Search Service Snippets

IP.com Disclosure Number: IPCOM000146672D
Publication Date: 2007-Feb-19
Document File: 3 page(s) / 111K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method for a clustering engine that takes search results from another search service and clusters them. Benefits include a solution that is simple, fast, and does not add to the storage footprint.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

A Cluster Search by Using Search Service Snippets

Disclosed is a method for a clustering engine that takes search results from another search service and clusters them. Benefits include a solution that is simple, fast, and does not add to the  storage footprint.

Background

Search engine clustering is used to help end-users better find what they are looking for, and refine their search results. Typically, the user interface consists of normal search results displayed along the right side of the page, and the clusters appear as a list along the left side of the page. When the user clicks on any one of the cluster names, the search results that do not appear in that cluster are filtered out.  For example, if one did a search on “zoo” then some clusters would be “events”, “photos”, “education”, or “animals”.  Then if one were to click on the “photos” cluster, the search results would only show the sites that contained zoo photos.

Cluster Searches consist of an algorithm, as well as graphical user interface (GUI) components.  The algorithm can be used to cluster results from one or many different search engines, as long as the proper information is passed to the algorithm.

There are a number of clustering algorithms that exist today, and a number of university research papers that have been published on clustering. However, these algorithms use complex mathematical formulas to determine what clusters a search result will belong to.

General Description

The disclosed method is a clustering engine that takes search results from another search service and clusters them. The clustering algorithm can be embedded into a locally owned search algorithm, or it can make use of public web services (such as Google or MSN) to get search results. Search results are clustered by using the words contained in the page titles, as well as the key word in context snippets that are returned from the search services. This works well because the page titles and snippets are loaded with keywords. By doing this, the clustering engine is not required to index the full content of every page, but can instead do its parsing real-time.

Once the words have been parsed from the titles and snippets, they are stored in an inverted document index. Each value of the inverted index is a data structure that contains the word, its count, the list of documents that contain it (their IDs), and the stem of the word. As ea...