Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Information Retrieval

IP.com Disclosure Number: IPCOM000090705D
Original Publication Date: 1969-Jun-01
Included in the Prior Art Database: 2005-Mar-05
Document File: 2 page(s) / 13K

Publishing Venue

IBM

Related People

Perriens, MP: AUTHOR [+2]

Abstract

Information retrieval involves the searching of large data files in an effort to locate pertinent documents. In this method of information retrieval, documents are stored in a data file in text form and generally comprise abstracts of other longer documents.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 2

Information Retrieval

Information retrieval involves the searching of large data files in an effort to locate pertinent documents. In this method of information retrieval, documents are stored in a data file in text form and generally comprise abstracts of other longer documents.

All abstracts or documents within the data base are subjected to a semantic analysis. From the totality of words within the data base, root words or stems are determined which are deemed to be representative of particular informative words found within the data base itself. For example, the root word autom can be derived from the words automatic, automation, automaton, automobile, and others. The root word, thus, is a tag or key for identifying certain words contained within a document.

Upon the completion of the semantic analysis of the data base, all the root words within that data base are determined. Each document or abstract in the data base is then given a number. Each document is scanned to determine which of the root words are found in each of the documents. A concordance is generated which for each root word lists all documents having a given root word.

Based upon the concordance, for each root word a probability is determined by dividing the total number of documents in the data base into the number of documents having a given root word. This is known as the root word probability.

In order to use the information retrieval system, an abstract of the desired subject matter sought in other documents is generated. This abstract is written in text form.

Upon entering in text form the input abstract, known also as a query, the program analyzes the query to determine whi...