Browse Prior Art Database

Method of Avoiding Multiple Hits in Pattern Matching Search

IP.com Disclosure Number: IPCOM000121135D
Original Publication Date: 1991-Jul-01
Included in the Prior Art Database: 2005-Apr-03
Document File: 3 page(s) / 122K

Publishing Venue

IBM

Related People

Maras, S: AUTHOR

Abstract

An approach is disclosed that enables a source string to be matched against a list of pattern strings, the pattern strings being processed in order from the least to the most general.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Method of Avoiding Multiple Hits in Pattern Matching Search

      An approach is disclosed that enables a source string to
be matched against a list of pattern strings, the pattern strings
being processed in order from the least to the most general.

      The method described here assumes that user-defined wildcard
characters are allowed. The following wildcard characters are used in
the example:
      '*'  represents none, one or n characters
      '?'  represents any single character

      Let us assume that the source string S should be matched
against the list of pattern strings contained in the table below.
Each of the pattern strings is related to an action that is to be
performed on the source string if a match is found.

      The following example shows the sample source string and the
table containing pattern strings and corresponding actions.
      S   Table
      =============================
      IEE123R    '*'         Action 1
                 '*R'        Action 2
                 'I*R'       Action 3
                 'IE*'       Action 4
                 'IEE???*'   Action 5
                 'IEE???R'   Action 6
                 'IEE*'      Action 7

      The problem is how to implement a wildcard search so that the
least general wildcard pattern is always selected first when the
source string matches more than one pattern string.

      Since a wildcard search can only be performed sequentially, the
first entry in the table is checked first. Since it matches the
source string, the search will not continue.  However, the item
'IEE???R' is the least general value and this is the one that should
be chosen.  The problem is how to implement the search function so
that the least general item is chosen.

      Ordinary sorting in ascending or descending order will not
solve the problem, especially if user-defined wildcard characters are
allowed. The solution is a sort that understands the wildcards.

      If assumed that the entries containing wildcards will be sorted
in ascending order, then the following transformations should be
performed before the sort is performed:
1. Translate wildcard characters corresponding to '*' and '?' to some
other characters that have higher EBCDIC/ASCII code than any
character that is going to be present in the pattern string.  The
character representing the '?' should have lower EBCDIC/ASCII code
than the character representing '*'. For example, characters 'FB'x
and 'FA'x can be used, respectively. This would cause entries
containing wildcard characters to be sorted toward the bottom of the
table. If pattern A = 'a??*   ' and pattern B = 'a*     ', then
pattern B will be sorted before pattern A, if '...