Text Retrieval Computers
Original Publication Date: 1979-Mar-01
Included in the Prior Art Database: 2005-Nov-10
Software Patent Institute
Lee A. Hollaar: AUTHOR [+3]
AbstractUniversity of Illinois at Urbana-Champaign
THIS DOCUMENT IS AN APPROXIMATE REPRESENTATION OF THE ORIGINAL.
This record contains textual material that is copyright ©; 1979 by the Institute of Electrical and Electronics Engineers, Inc. All rights reserved. Contact the IEEE Computer Society http://www.computer.org/ (714-821-8380) for copies of the complete work that was the source of this textual material and for all use beyond that as a record from the SPI Database.
Text Retrieval Computers
Lee A. Hollaar
University of Illinois at Urbana-Champaign
(Image Omitted: The hardware required for efficient text retrieval differs from that required for retrieval of formatted data. Here is an examination of such hardware, particularly term comparators.)
Much of the preliminary work in scientific and other research involves locating previous works related to the desired goal or topic. This literature searching is aided by manually prepared index materials, such as keyword-in- context indexes, collections of abstracts of papers on a given subject, such as Computing Reviews, bibliographies, and survey or tutorial articles containing lists of references, such as in this paper. Often these initial search sources lead to additional possible search locations, with the process continuing until all the necessary information has been located, or (as is often the case) documents of interest are unavailable.
Unfortunately, even the best prepared manual indexing systems do not ideally solve the text retrieval problem. In the first place, if the indexing is extensive, a great deal of time can be spent just searching the index material for possible citations. Furthermore, this extensive indexing can require a great deal of space; much of the space of a law library is devoted to volumes such as treatises and corpora, used primarily for finding (or for eliminating the need to find by providing an appropriate abstract) pertinent case law or statutes. Secondly, time is required to manually prepare the index material, making it difficult to locate the most current and hence potentially the most valuable material. Even if these problems could be solved, searching can only be done based on criteria determined by the indexers, which may not include the new concept of interest.
In the past, development of large-scale computerized text retrieval systems was inhibited by two factors -- the cost of storing the tremendous amount of data needed for a complete system (about 30 billion characters to store most case law in a legal retrieval system), and the cost of entering the data into storage. Advances in memory technology, especially. in the area of high- density moving- arm disk systems, has minimized the first difficulty. Ten years ago, the storage of 30 billion characters would have required 4000 2311-type disk drives, at 7.25 million characters per drive, a number prohibitive just in terms of space and cabling if not in cost. Currently, commercially available drives can store 300 million characters on a sin...