Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Rapidly Searching for Character String Matches Using Hash Coding

IP.com Disclosure Number: IPCOM000079277D
Original Publication Date: 1973-Jun-01
Included in the Prior Art Database: 2005-Feb-26
Document File: 1 page(s) / 12K

Publishing Venue

IBM

Related People

Davison, GA: AUTHOR

Abstract

A technique is provided for efficiently searching a large amount of data and locating any occurrences of character strings belonging to a set of sought-after character strings, when the set of sought-after strings is large.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 56% of the total text.

Page 1 of 1

Rapidly Searching for Character String Matches Using Hash Coding

A technique is provided for efficiently searching a large amount of data and locating any occurrences of character strings belonging to a set of sought-after character strings, when the set of sought-after strings is large.

If the data to be searched can be broken into candidate strings, then the set of sought-after strings can be placed in a hash table, and a hash code search of the table can be made for each of the candidate strings. Any candidate string matching an entry in the table would represent a "find", while candidate strings not matching would be rejects. The present technique provides a manner of breaking up the data into candidate strings, where there is no knowledge of the nature of the character strings to be matched.

The solution is to treat all of the character strings in the sought-after set as if they are all of a single length (3 characters long, for example) even though they are of different length. The first three characters of each string are used to represent the string. With this method, the sought-after strings are still entered in the hash table in their entirety as before (not just the first three), but only the first three characters are allowed to influence the hashing location at which the string is entered in the table.

If one and two character strings are present in the set of sought-after strings, they can be treated the same as the longer strings, except that only...