Browse Prior Art Database

Flexible English Text Matching

IP.com Disclosure Number: IPCOM000120652D
Original Publication Date: 1991-May-01
Included in the Prior Art Database: 2005-Apr-02
Document File: 1 page(s) / 34K

Publishing Venue

IBM

Related People

Harada, M: AUTHOR [+2]

Abstract

This article describes a way to rewrite English sentences to be used as keys to search an English sentence data base.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 93% of the total text.

Flexible English Text Matching

      This article describes a way to rewrite English sentences
to be used as keys to search an English sentence data base.

      With this, users can easily search English sentences in the
data base which are similar to an input sentence.

      The program rewrites English sentences with the following
rules.  The data base uses the rewritten English sentences as keys.
  1) All characters are converted into lower case characters.
  2) Printer control tags are deleted.
  3) Symbols such as ",", ".", "?", "!" are deleted.
  4) Numbers including "one", "two", etc., are deleted.
  5) Determiners, "the", "a", "an" are deleted.
  6) Verbs, adjectives and nouns are replaced with their
     base forms.
  7) Auxiliary verbs such as "can", "may", "must" are
     deleted.
  8) "not" is deleted.
  9) Pronouns are replaced with "it", except "this", "that",
     "these" and "those".
 10) Continuous blanks are replaced with a blank.

      In case the user can use an English analysis program, the
following rules are added.
 a) Adverbs are deleted.
 b) Adjectives which modify nouns are deleted.
 c) Relative pronouns are replaced with "that".
 d) "this", "that", "these" and "those" are replaced with
    "it".

      If sentences are for a specific area, some rules can be added.
For example, for the computer field, the following rules can be
added.
  A) "pf1", "pf2", "pa1" and so on are deleted.
  B) Words w...