Browse Prior Art Database

Induction of Rules for Document Classification

IP.com Disclosure Number: IPCOM000105176D
Original Publication Date: 1993-Jun-01
Included in the Prior Art Database: 2005-Mar-19
Document File: 4 page(s) / 78K

Publishing Venue

IBM

Related People

Apte, CV: AUTHOR [+3]

Abstract

Disclosed herewith is a text retrieval classification rule induction system consisting of:

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Induction of Rules for Document Classification

      Disclosed herewith is a text retrieval classification rule
induction system consisting of:

o   a preprocessing means for determining the values of the
    attributes used to describe a text and determining the category
    of a text from a defined set of categories.
o   an induction means for finding rule sets which distinguish
    categories from one another.

o   an evaluation means for choosing the best rule set, based on
    minimizing the classification error.

In many text storage and retrieval systems, texts are classified with
one or more codes chosen from a complete classification system.
Examples include the NTIS documents from the US government, news
services like UPI and Reuters, publications like the ACM Computing
Reviews and many others.  Assignment of classification codes manually
is expensive.  Recent work has shown that in certain environments,
rule based systems can do code assignment quickly and accurately.
However, the rule sets must be constructed by hand for each
application, also a time consuming and expensive process.  This
invention describes a means for automating the rule construction
process.

      The invention has been implemented as a suite of computer
programs.  A flow chart is shown in Fig. 2.  The program produces a
set of attribute values for each text, where the attributes are
single words or word phrases, and the values are either binary, i.e.,
the attribute appears in the text or does not, or are number...