THE EFFECTIVENESS OF THE THESAURUS METHOD IN AUTOMATIC INFORMATION RETRIEVAL
Original Publication Date: 1975-Nov-30
Included in the Prior Art Database: 2007-Mar-29
Software Patent Institute
Yu, C.T.: AUTHOR [+3]
THE EFFECTIVENESS OF THE. TI-IESAURUS METHOD I N AUTOMATIC I N F O R T I O N RETRIEVAL
THE EFFECTIVENESS OF THE. TI-IESAURUS METHOD
I N AUTOMATIC I N F O R ~ ~ T I O N RETRIEVAL
C.T. Yu and (;, ~al.ton*
-1- Department of Computing Science, University of Alberta, Edmonton, Alberta.
* Department of Computer Science, Cornell University, Ithaca, NY 14853. This study was supported i n part by the Canadian Research Council
and i n part by the National Scj-ence Foundation under grant G J 43505.
Department of Computer Science Cornell University
Ithaca, New York
The Effectiveness of the Thesaurus Xethod in Auro%itic Information Retrieval
C.T. YU' and G. Saitor."
T e n grouping and thesaur~s methods have frequently been incor?orated
collect ion with uneven ireqrency c l s t r i h t ions ; that is, in ceztain docui~ents thcir occurrence ficc-ueccies are xc:? lzrge? rhaz wo.dld
Sc expec:ed fro-^ a rando? ass&:n.enr of terzs to lo:,~?.er.:s;
r.onspecia2.t~ vords, on t:lc o~b.er hand e;;::iSix ra~2sz occ~ze-ce
patterns i3 ?'e docunents of a collecrion.
b) The xost effective contect icsntifiers exhibi: 1i:tle re?:xlazc?
.,. othen terms 6lso used fan con~e3t i&er.tificafior.; 13 :articd~-, terms with high doc~tent F-ewe-cy -
z:-iose assignee 70 a l z g s proportion of :he docuxents of a coll~ction
-ten2 ro ke indiscriminate in t k e k racrieval capability 2r.d lead tc losses L? retyisval precisio::.h
c) Effective content icentifiers are ex2ecrsd to'hreak u; large chsre-s of documents xh2t are not o~hervise dis:ingcis:laLle fsr rerria,;& p~mposes; that is, they shoule reduce the existkg ur.:ertai=:y fo? the givec docuaant set. Thcs, term ?hat occur vii5 cxccssively
low docl~nent frequezcy i:: the doci~~e;l:s of s. collec-:ion =e cot optirnal and lead to uacceptak.le losses in recall.
:: The effectiveness of a retrieval systen i s often evaluate5 5jr 70
B t o aiioaiic content analysis pmgans as Cevices for the recognition of
synonyzous ex?:essions and of lingzistic entities that may be sc,antica:ly
skih but sptac=ical?.y distinct. While it has irequently beer. asserted
..a. t>e reccgnltion of s)rr.on;.ins is essential in lan~dage analysis, actual
of the usefulness of a thesawas in autoxatic infornation retrieval
In the presenr study, 50-sal proofs aye given of the effectiveness
mder well-cieii?ei conditiocs of the thesaurus nexhod in infornation retrieval.
- It i s shva, in puticul;-, that when certain scnantically related terns are &de< to the infor;ation queries originally submittel by the user popularion, a su?erior retrieval sysies is obtained in the sense that for every level of the recall the retrieval precision i s a t least as good for the altered queries as for :he crigir.& ones.
A goo6 desl is :mom &out the rearesentation of document content and the
ccz?lexer,tary measaes :tnovr, as nrezislcr. a ~ d