Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Fast Matching Algorithm for Static Groups of Ascii Strings

IP.com Disclosure Number: IPCOM000031130D
Original Publication Date: 2004-Sep-13
Included in the Prior Art Database: 2004-Sep-13
Document File: 3 page(s) / 80K

Publishing Venue

IBM

Abstract

Disclosed is an algorithm designed to increase the performance of matching an ASCII string to a set of statically defined ASCII strings. More specifically, this system was designed to match an unknown set of bytes of characters to a known set of HTTP headers. This system is designed for the unknown string to possibly not match any of the known set of headers. It is also designed to avoid cache miss problems of other tree based matching and search structures.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 55% of the total text.

Page 1 of 3

Fast Matching Algorithm for Static Groups of Ascii Strings

There is mostly a known set of supported http headers in the world and being able to quickly map headers sent by a client is better performing than to consistently create and release new Strings. Previous attempts at algorithms to match a set of known strings was too expensive when the strings were static and the checks must be case insensitive:

For hash table like lookups, computing the hash codes and doing an equal comparison was too expensive in the amount of CPU cycles the search took. This is without even doing the necessary case insensitive lookups.

For tree based lookups, the amount of nodes in the tree when there is 75-100 Strings grows to a point where the performance is severely hurt to 2nd and 3rd level cache misses because the data being accessed in not contiguous.

This disclosure dicscusses a new data structure/algorithm for matching Strings (in String, byte[], or char[] format) to a static set of Strings to get an object assigned to that String. This algorithm makes assumptions that the number of lookups can be significantly reduced by looking at first character, length of String, and last character first. Following that, the algorithm will go backwards through the array provides for the least amount of conflicts.

Overall data structure:

a

a

26 nodes

 Strings configured: { byte[][], char[][] } Corresponding values:

{ Object [] }

z

z

Flow of the macthing algorithm:

1

[This page contains 1 picture or other non-text object]

Pa...