Browse Prior Art Database

Lessening Index File for Full Text Search

IP.com Disclosure Number: IPCOM000116865D
Original Publication Date: 1995-Nov-01
Included in the Prior Art Database: 2005-Mar-31
Document File: 2 page(s) / 23K

Publishing Venue

IBM

Related People

Kubota, R: AUTHOR

Abstract

Disclosed is a method to reduce the size of the index file used for the full text search formerly disclosed by (*). In (*), the index file for full text search is made by uniformly extracting

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 100% of the total text.

Lessening Index File for Full Text Search

      Disclosed is a method to reduce the size of the index file used
for the full text search formerly disclosed by (*).  In (*), the
index file for full text search is made by uniformly extracting

N-gram characters.  In this newly disclosed method, m-grams ( m < N )
are extracted when and only when rarely searched characters are
encountered.  The set of rarely searched characters needs to be
defined in advance.  The special characters such as ",", "." are the
candidates of that set.

      Generally, in the index method based on N-gram characters, as
the value N of a N-gram gets smaller, the index file gets smaller but
the search speed gets slower.  This disclosure argues that a good
trade-off between index file size and search speed can be reached by
making the value N of a N-gram variable to characters which are
rarely contained in search terms.
  (*) Japanese Patent HEI06-287642