Browse Prior Art Database

Automatic Information Retrieval

IP.com Disclosure Number: IPCOM000131457D
Original Publication Date: 1980-Sep-01
Included in the Prior Art Database: 2005-Nov-11
Document File: 16 page(s) / 55K

Publishing Venue

Software Patent Institute

Related People

Gerard Salton: AUTHOR [+3]

Abstract

Advances such as specialized parallel hardware and new algorithms for text searching will improve the effectiveness of information retrieval systems.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 7% of the total text.

Page 1 of 16

THIS DOCUMENT IS AN APPROXIMATE REPRESENTATION OF THE ORIGINAL.

This record contains textual material that is copyright ©; 1980 by the Institute of Electrical and Electronics Engineers, Inc. All rights reserved. Contact the IEEE Computer Society http://www.computer.org/ (714-821-8380) for copies of the complete work that was the source of this textual material and for all use beyond that as a record from the SPI Database.

Automatic Information Retrieval

Gerard Salton

Cornell University

Advances such as specialized parallel hardware and new algorithms for text searching will improve the effectiveness of information retrieval systems.

Information files of all kinds are now in common use -- personnel records, parts inventories, customer account information, business correspondence, document holdings in libraries, patient records in hospitals, and so on. Information retrieval systems are designed to help analyze and describe the items stored in a file, to organize them and search among them, and finally to retrieve them in response to a user's query.

Designing and using a retrieval system involves four major activities: information analysis, information organization and search, query formulation, and information retrieval and dissemination.'~3

Information analysis.

This task, also known as indexing, consists of the assignment of one or more identifiers, or index terms, to each information item. These terms are designed to identify and represent the stored item. Two types of identifiers are now in widespread use. The first consists of objective attributes -- in a personnel file, for example, a person's name, age, and job classification could be used to identify a personnel record; similarly, in a library file a book's author, publisher, and date of publication could be used to identify the record of a book. In addition to using the values of certain objective attributes for information identification, it is also possible to utilize subjective attributes, or content terms, to describe each stored item. Thus, a person whose record is included in a personnel file could be characterized as intelligent, hard-working, and compassionate; a library item could be identified similarly by using words or phrases describing the content of the document.

In existing retrieval environments, the indexing task is carried out manually by subject experts trained to assign the relevant identifiers to the information items.

However, some experimental indexing systems perform automatic content analysis to select appropriate identifiers. These systems may be used when the stored information items consist of natural language texts -- e.g., books in a library collection, medical summaries. In these cases, an explicit indexing operation may be avoided entirely by assuming that the individual words in the natural language text represent the information content.

Information organization and search.

Once the items have been indexed, the information file in which t...