Facet Clustering Using Multi-Dimension Facets
Original Publication Date: 2009-Sep-22
Included in the Prior Art Database: 2009-Sep-22
Proposed herein is a method for faceted search that clusters together values which are highly correlated in the result set, i.e., often appear together on the same documents in the result set. The article proposes an implementation and several UI patterns on how to display these clusters and how the user can benefit from them.
Faceted search is a common technique for adding navigation to a search engine.
"facet" is a single aspect of the indexed items that can be used to classify them. For example, in a collection of books at an online bookstore, some of the facets that a book might have are its price, its author, and its publication date. Faceted search adds to the ordinary search results a
navigation box, showing the most useful values for each facet. In the above book-seller example with an "
facet, the faceted navigation box might show the 10 authors who have written the most books that matched the user's
query, and allow the user to narrow the search to a specific author.
In recent years, faceted search has become a very common UI feature in search engines, especially in e-commerce sites. Faceted search makes it easy for untrained users to find the specific item they are interested in. Faceted search has also been used as a component in more advanced search features, such as in ,
which adds a
"Related Person" facet to documents as a way to implement expert search.
Designers of faceted-search systems realize that often a given facet takes on too many different values. For example, a single search in the above book-seller example might result in 100 different publication dates and 100 different prices, none of them significantly more common than the other.
A common solution is to cluster several values together.
For example, rather than using exact prices, price ranges (e.g., "$10-$20") are often used as values for the price facet. Some systems (such as ) allow a hierarchy of facet values - so for example initially $10 price ranges are shown, but when the user narrows the search to just one of those ranges, $1 ranges inside it are shown.
Another good example is
publication dates - initially counts for years are shown, but when the user narrows down to a single year, the separate months inside it are shown.
Using a hierarchy instead of a long list of values works well for some concepts that naturally admit such a hierarchy - like dates, prices, and geographic locations. But other concepts like book authors, or people in expert search, do not have a natural or relevant hierarchy, so we want to look for different ways to cluster them.
We can divide such clustering attempts into two broad types: offline clustering, and ad-hoc clustering. In offline clustering, one would spend some effort analyzing in advance all the available data and define clusters of related facet values. For example, a few authors that often write their books together, or a group a people who seem to be interested in many of the same documents. Such offline clustering can be thought of as introducing a hierarchy in the facet values where such an hierarchy was not previously known.
clustering, on the other hand, is about finding clusters in the...