Browse Prior Art Database

Method of Information Retrieval Based on Collocation

IP.com Disclosure Number: IPCOM000117942D
Original Publication Date: 1996-Jul-01
Included in the Prior Art Database: 2005-Mar-31
Document File: 2 page(s) / 66K

Publishing Venue

IBM

Related People

Nasukawa, T: AUTHOR

Abstract

Disclosed is a device for information retrieval based on collocation patterns in sentences that consist of a modifier word, modifiee word, and the relationship between a modifier and modifiee word.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Method of Information Retrieval Based on Collocation

      Disclosed is a device for information retrieval based on
collocation patterns in sentences that consist of a modifier word,
modifiee word, and the relationship between a modifier and modifiee
word.

      This device extracts collocation patterns in a query phrase or
sentence entered by a user, searches for identical or similar
collocation patterns stored as an index of text items in a database,
evaluates preferences of each text item that contain some of the
collocation patterns in the query phrase or sentence, and presents
information of preferable text items along with collocation
information contained in each text item that is relevant to the query
phrase.  This device consists of a sentence parser and dictionaries
to identify similar words, such as synonyms.  It preliminarily parses
each sentence in texts in databases and extracts collocation patterns
in order to make an index file.  For example, it extracts a
collocation pattern, program --(OBJ)-->  develop from a sentence in
which a word "program", modifies "develop" as an object, and stores
this collocation with the identification code of the text item to
which the sentence belongs.  When a user enters a query phrase or
sentence, this device:
  1.  parses the input phrase or sentence
  2.  extracts collocation patterns from the parse result
  3.  searches for texts in the database that contain some of the
       collocation patterns in the query phrase or sentence
  4.  evaluates the preference value of each text according to the
       number of collocation patterns in the text that is common to
the
       query phrase...