Browse Prior Art Database

Freeword Text Searching method for the word has non-alphanumeric characters

IP.com Disclosure Number: IPCOM000015981D
Original Publication Date: 2002-Jul-01
Included in the Prior Art Database: 2003-Jun-21
Document File: 3 page(s) / 63K

Publishing Venue

IBM

Abstract

Summary Disclosed is the method for improving of the string search performance and the search precision. In order to improve the search performance, this disclosure regards a word has non-alphanumeric special characters as one word, as if it had not non-alphanumeric special characters from the beginning. And disclosed is a method for saving and storing the user oriented special character's information which is regarded as a part of the alphabetical characters. In order to store that information, this disclosure provides an application programming interface which allows an user to specify some preferable non-alphanumeric special characters as a parameter of that interface in advance. Assumptions

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 39% of the total text.

Page 1 of 3

Freeword Text Searching method for the word has non-alphanumeric characters

Summary

Disclosed is the method for improving of the string search performance and the search precision. In order to improve the search performance, this disclosure regards a word has non-alphanumeric special characters as one word, as if it had not non-alphanumeric special characters from the beginning.

And disclosed is a method for saving and storing the user oriented special character's information which is regarded as a part of the alphabetical characters. In order to store that information, this disclosure provides an application programming interface which allows an user to specify some preferable non-alphanumeric special characters as a parameter of that interface in advance.

Assumptions

First of all, as far as this disclosure is concerned, a non-alphanumeric special character means a character which is represented as a non-letter or non-digit character, also it can be used as a mark or symbol character such as '@','%','$' or etc.

And this disclosure can become to be a part of the computer programming logic, and also it is useful for the Free word text or Full text search program which is generally called as a search engine.

This disclosure concerns with Free Word or Full Text Searching engines, particularly, which use the index files for searching.

Objectives The purpose of this disclosure includes two aspects, one of them concerns with the search performance, and the other concerns with the search results.

This disclosure intends to satisfy the next purposes. - To improve the search performance.

- To reduce the useless search results.

In an indexing process of the search engine, in order to improve the search performance, this disclosure regards a word which has non-alphanumeric special characters as one word which has only alphanumeric characters.

And in order to eliminate the useless search results, this disclosure prevents a string tokenizing process from separating the word into the several tokens.

And further more, in order to achieve two purposes, this disclosure allows the users to specify any special characters in advance, whatever they want. So that this disclosure provides a programming interface called as an Application Programming Interface.

Since this disclosure provides a character attribute changing method through this interface,
this disclosure enables a search engine to convert the specified special characters to non-alphanumeric characters.

As a result, this disclosure contributes to the performance improvement for searching, and also that enables an user to create the user oriented indexing rules, independently.

Speaking of the conventional full text search technology, in an indexing process, the search engine regards a special character as a part of the delimiter token. However, this word separation process is open to retrieve the useless search results.

Therefore, this disclosure intends to eliminate the useless search results...