Browse Prior Art Database

THE EFFECTIVENESS OF THE THESAURUS METHOD IN AUTOMATIC INFORMATION RETRIEVAL

IP.com Disclosure Number: IPCOM000148227D
Original Publication Date: 1975-Nov-30
Included in the Prior Art Database: 2007-Mar-29

Publishing Venue

Software Patent Institute

Related People

Yu, C.T.: AUTHOR [+3]

Abstract

THE EFFECTIVENESS OF THE. TI-IESAURUS METHOD I N AUTOMATIC I N F O R T I O N RETRIEVAL

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 28% of the total text.

Page 1 of 10

THE EFFECTIVENESS OF THE. TI-IESAURUS METHOD

I N AUTOMATIC I N F O R ~ ~ T I O N RETRIEVAL

-t-

C.T. Yu and (;, ~al.ton*

November 1975

-1- Department of Computing Science, University of Alberta, Edmonton, Alberta.

* Department of Computer Science, Cornell University, Ithaca, NY 14853. This study was supported i n part by the Canadian Research Council
and i n part by the National Scj-ence Foundation under grant G J 43505.

Department of Computer Science Cornell University
Ithaca, New York

[This page contains 1 picture or other non-text object]

Page 2 of 10

The Effectiveness of the Thesaurus Xethod in Auro%itic Information Retrieval

C.T. YU' and G. Saitor."

. .

I

T e n grouping and thesaur~s methods have frequently been incor?orated

collect ion with uneven ireqrency c l s t r i h t ions ; that is, in ceztain docui~ents thcir occurrence ficc-ueccies are xc:? lzrge? rhaz wo.dld
Sc expec:ed fro-^ a rando? ass&:n.enr of terzs to lo:,~?.er.:s;

r.onspecia2.t~ vords, on t:lc o~b.er hand e;;::iSix ra~2sz occ~ze-ce

patterns i3 ?'e docunents of a collecrion.

                  . .
b) The xost effective contect icsntifiers exhibi: 1i:tle re?:xlazc?

. r~l.h

 .,. othen terms 6lso used fan con~e3t i&er.tificafior.; 13 :articd~-, terms with high doc~tent F-ewe-cy -

                              z:-iose assignee 70 a l z g s proportion of :he docuxents of a coll~ction

                                  -ten2 ro ke indiscriminate in t k e k racrieval capability 2r.d lead tc losses L? retyisval precisio::.h
c) Effective content icentifiers are ex2ecrsd to'hreak u; large chsre-s of documents xh2t are not o~hervise dis:ingcis:laLle fsr rerria,;& p~mposes; that is, they shoule reduce the existkg ur.:ertai=:y fo? the givec docuaant set. Thcs, term ?hat occur vii5 cxccssively
low docl~nent frequezcy i:: the doci~~e;l:s of s. collec-:ion =e cot optirnal and lead to uacceptak.le losses in recall.

:: The effectiveness of a retrieval systen i s often evaluate5 5jr 70

B t o aiioaiic content analysis pmgans as Cevices for the recognition of

/

synonyzous ex?:essions and of lingzistic entities that may be sc,antica:ly

I

I

skih but sptac=ical?.y distinct. While it has irequently beer. asserted

=L-+

..a. t>e reccgnltion of s)rr.on;.ins is essential in lan~dage analysis, actual

1 p~oois
of the usefulness of a thesawas in autoxatic infornation retrieval

I

i

    In the presenr study, 50-sal proofs aye given of the effectiveness
mder well-cieii?ei conditiocs of the thesaurus nexhod in infornation retrieval.

- It i s shva, in puticul;-, that when certain scnantically related terns are &de< to the infor;ation queries originally submittel by the user popularion, a su?erior retrieval sysies is obtained in the sense that for every level of the recall the retrieval precision i s a t least as good for the altered queries as for :he crigir.& ones.

I


I. Introductioc

I

A goo6 desl is :mom &out the rearesentation of document content and the

ccz?lexer,tary measaes :tnovr, as nrezislcr. a ~ d

"ecalL, res>e-,t;vel...