Browse Prior Art Database

A PROCESS FOR EFFICIENT STRING MATCHING

IP.com Disclosure Number: IPCOM000007962D
Original Publication Date: 1996-Nov-01
Included in the Prior Art Database: 2002-May-08
Document File: 3 page(s) / 147K

Publishing Venue

Motorola

Related People

James H. Tolar: AUTHOR

Abstract

Applications in Computer Aided Design ofien require the ability to efficiently sort and search col- lections of character strings. Examples include pla minimization, simulation results analysis, testability tools, and various fault-grading applications. Sorting and searching arbitrarily long character strings is a computationally complex process that generally involves multiple passes over the data-set and thus imposes a heavy burden on the individual CAD appli- cation. This paper presents a single-pass process for identifying matching strings in a data-set consisting ofarbitrarily long character strings.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 48% of the total text.

Page 1 of 3

Technical Developments

A PROCESS FOR EFFICIENT STRING MATCHING

by James H. Tolar

1 .O OVERVIEW

  Applications in Computer Aided Design ofien require the ability to efficiently sort and search col- lections of character strings. Examples include pla minimization, simulation results analysis, testability tools, and various fault-grading applications. Sorting and searching arbitrarily long character strings is a computationally complex process that generally involves multiple passes over the data-set and thus imposes a heavy burden on the individual CAD appli- cation. This paper presents a single-pass process for identifying matching strings in a data-set consisting ofarbitrarily long character strings.

  The next section briefly examines current solu- tions to the problem. The following section presents the proposed solution. The final section summarizes the paper.

tion to matching full strings would be decidedly difficult.

  Another popular approach involves using one of many well-known techniques to first sort the data- set, then use an efficient search algorithm to quickly find any matching strings. This technique does not reduce the excessive run-time, as the order of the algorithm is still N2, and the addition of the extra step adds even more complexity to the implementing program. This will make extension of the program a difftcult matter.


3.0 PROPOSED SOLUTION

  Examples of prior solutions to the problem include exhaustive enumeration and various sort- first, then search techniques. The exhaustive enu- meration approach is much like one might use in a purely manual process. Using this technique, you compare each string to every other, in order, exam- ining each character in each string to determine whether or not a match has occurred. This process ultimately determines the matching strings, but it suffers from excessive run-time. This algorithm is of order Nz, which means that the run-time grows in proportion to the square of the size of the data- set. In addition, exhaustive enumeration is not par- ticularly well-suited to extension. That is, changing the process to select matching sub-strings in addi-

  The challenge is to identify all matching strings in a randomly-ordered set of strings of characters of arbitrary lengths. In the following paragraphs; the set of strings is called the data-set, the number of different characters allowed in a string is known as the order of the set, the maximum length of string is known as the depth.

  Imagine a maze consisting of a series of rooms connected to each other by doors. Each door is either an entrance or an exit, but not both. In addition, each room may have only one entrance, but many exits. There is only one room that acts as entrance to the maze.

  Each exit horn each room is labelled with a member of the character set in the string data-set and each room has one exit for each type of charac- ter. For a maze representing strings of binary digits, each room would have 2 exits, one labeled "0...