Browse Prior Art Database

Method and System for Organizing Potential Search Suggestion Strings in a Database by Using Signature Clustering

IP.com Disclosure Number: IPCOM000242949D
Publication Date: 2015-Sep-02
Document File: 3 page(s) / 35K

Publishing Venue

The IP.com Prior Art Database

Related People

Zhongqiang Chen: INVENTOR [+6]

Abstract

A method and system for organizing potential search suggestion strings in a database by using signature clustering is disclosed. The potential search suggestion strings are organized such that lexically similar strings are discovered efficiently for any given query. A string in a database is represented with a fixed sized signature derived from shingles of the string. Thereafter, all strings are clustered into a large number of groups with help of corresponding signatures such that strings within each group are lexically similar. The strings in a group are ranked according to similarity with a given query, and strings with highest ranks are selected as the final suggestions.

This text was extracted from a Microsoft Word document.
This is the abbreviated version, containing approximately 49% of the total text.

Method and System for Organizing Potential Search Suggestion Strings in a Database by Using Signature Clustering

Abstract

A method and system for organizing potential search suggestion strings in a database by using signature clustering is disclosed.  The potential search suggestion strings are organized such that lexically similar strings are discovered efficiently for any given query.  A string in a database is represented with a fixed sized signature derived from shingles of the string.  Thereafter, all strings are clustered into a large number of groups with help of corresponding signatures such that strings within each group are lexically similar.  The strings in a group are ranked according to similarity with a given query, and strings with highest ranks are selected as the final suggestions.

Description

In many services provided by search engines such as suggestion-as-you-type and "related topics", it is quite typical that given a user query, strings with lexical similarity with the sample strings are to be retrieved as fast as possible from an extremely large database.  Due to stringent time constraints, it is a challenging task to satisfy the requirements on response latency, suggestion coverage, and relevance while retrieving sample strings.

Disclosed is a method and system for organizing potential search suggestion strings in a database by using signature clustering.  The potential search suggestion strings are organized such that lexically similar strings are discovered efficiently for any given query.  A string in a database is represented with a fixed sized signature derived from shingles of the string.  Thereafter, all strings are clustered into a large number of groups with help of corresponding signatures such that strings within each group are lexically similar.  The strings in a group are ranked according to similarity with a given query, and strings with highest ranks are selected as the final suggestions.

The method and system consists of two phases namely a clustering phase and a retrieval phase.  In a clustering phase, the method and system constructs k-shingles for each string in a database of potential suggestions and computes signature for shingles.  The signature of each string is partitioned into multiple segments and each segment is mapped into a cluster identifier that is inserted into a corresponding cluster.  After all entries in the database are processed and clustered into groups, the size of each cluster is checked.  If the number of entries falling into a cluster is larger than a specified value, the entries in a group are further clustered until the dimension of every resulting sub-group is within specified limits or numbers of iteration rounds reach a maximum number.

In a retrieval phase, the method and system builds k-shingles of a query and calculates signature based on a set of k-shingles to fetch suggestions that are lexically similar to a given query from a database.  A signature is par...