Browse Prior Art Database

Method and System for Building a Synonym Dictionary

IP.com Disclosure Number: IPCOM000197950D
Publication Date: 2010-Jul-23
Document File: 2 page(s) / 42K

Publishing Venue

The IP.com Prior Art Database

Related People

Rukmini Iyer: INVENTOR [+3]

Abstract

A method and system for building a synonym dictionary from web search results is disclosed. The synonym dictionary is built using doc-doc pairs selected from web search results in order to increase coverage and achieve a higher quality synonym dictionary.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 62% of the total text.

Method and System for Building a Synonym Dictionary

Abstract

A method and system for building a synonym dictionary from web search results is disclosed.  The synonym dictionary is built using doc-doc pairs selected from web search results in order to increase coverage and achieve a higher quality synonym dictionary.

Description

Disclosed is a method and system for building a synonym dictionary from web search results.  The method disclosed herein, involves running a query to retrieve a set of web search results.  In an instance, top document descriptions from the web search results are retrieved and paired.  For example, consider two documents "d_{i1}" and "d_{i2}" retrieved from the web search results.  The two documents are paired and thereafter used to build the synonym dictionary.  Typically, doc-doc pairs have better lengths and higher word correspondence, in comparison with query-doc pairs.  Consequently, doc-doc pairs have higher coverage, resulting in the synonym dictionary built from the web search results being of higher quality.  The synonym dictionary is a translation table trained by a machine translation algorithm.

In another instance, the method pairs other documents retrieved from the web search results.  For example, consider two documents "d_{ij}" and "d_{ik}" retrieved from the web search results.  The documents may be paired based on, for example, a relevancy score returned by a search engine, a computed vector similarity between document pairs,...