Browse Prior Art Database

Datamodel for fast access of context of textual data

IP.com Disclosure Number: IPCOM000015323D
Original Publication Date: 2001-Nov-10
Included in the Prior Art Database: 2003-Jun-20
Document File: 2 page(s) / 54K

Publishing Venue

IBM

Abstract

Datamodel for fast access of context of textual data

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Datamodel for fast access of context of textual data

Datamodel for fast access of context of textual data

Disclosed is a datamodel that allows for fast access of contextual information of textual data. One of the applications of this datamodel is search. Searching a textual collection of documents is widespread, with many searches providing not satisfying results. This is partially due to the short query length and hence provides not enough contexts. It is widely accepted that the relevance feedback method improves results. Briefly, this method looks at the first n (where n is system specified) result documents of the query, picks appropriate words from these documents and adds them to the original query. This new expanded query is again submitted to the corpus and a new result list is produced. There are many variations of this algorithm in the art. The difficult question is which words to select from the original result list to add to the query. Too many words lead to long queries that take longer to execute, too many words can also lead to a lot of noise in the results. Furthermore, it seems obvious, that different applications would have different requirements in terms of which additional words to select. It would be prudent to annotate (a subset of) words in the collection with sufficient metadata, so that the appropriate words used to expand a query can be selected fast based on the values of the metadata. Clearly, the meta data would be precomputed.

We propose the following schema. Each document should be tokenized into words, each word should be stemmed, using...