Browse Prior Art Database

A Flexible Text Matching Method

IP.com Disclosure Number: IPCOM000123116D
Original Publication Date: 1998-May-01
Included in the Prior Art Database: 2005-Apr-04
Document File: 2 page(s) / 57K

Publishing Venue

IBM

Related People

Harada, M: AUTHOR

Abstract

This article describes a method for checking similarity between two sentences flexibly by referring a database.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 54% of the total text.

A Flexible Text Matching Method

   This article describes a method for checking similarity
between two sentences flexibly by referring a database.

   This method uses a database which has one or more
groups.  Each group has one or more sub-groups.  Each sub-group
contains terms, which consists of one or more words.  For example:
  Group-A : Nouns
    group-A.1 : Hardware
    group-A.2 : Software
    group-A.3 : Other nouns
  Group-B : Adjectives
    group-B.1 : Adjectives to show volume
    group-B.2 : Other adjectives
  Group-C : Adverb
    group-C.1 : All adverbs
  Group-D : Others
    group-C.1 : All others Each term may exist in more than one
                 sub-group or groups.

   There are two steps for checking similarity between two
sentences.
  Step-1:
    Program rewrites sentences in the following way.
      1.1 Program scans the sentence and finds out a word
           or a word sequence which matches a term in the database.
      1.2 Program replaces the word or word sequence with a
           following string.
          A010B00C0D0
          Here, A/B/C/D means Group A, B, C, and D.
          Three digits after "A" means three sub-groups for
           group "A".
            Digits after "B", "C", and "D" have same meaning.
          "010" after "A" means that the term exists in
           subgroup-A.2.
          If the term exists in more than one sub-groups, all
           corresponding bit is turned on.
      1.3 If more than one words or word sequences match with
           terms in the database, all the word or word sequence
           is replaced...