Browse Prior Art Database

Automated Documented Classification into Hierarchical Categories or Folders

IP.com Disclosure Number: IPCOM000123355D
Original Publication Date: 1998-Oct-01
Included in the Prior Art Database: 2005-Apr-04
Document File: 1 page(s) / 46K

Publishing Venue

IBM

Related People

Singhal, SK: AUTHOR

Abstract

Disclosed is a method for classifying documents in an environment containing a set of hierarchical categories or folders. This method tries to place messages into the most specific categories before attempting to use the more general categories. The scheme is suited for use in Electronic Document Processing applications or in e-mail systems. The technique differs from most existing automated schemes in that it can effectively work with hierarchical categories instead of simply a collection of disjoint categories. In particular, this technique places messages in the most specific (deepest) category available.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 68% of the total text.

Automated Documented Classification into Hierarchical Categories
or Folders

   Disclosed is a method for classifying documents in an
environment containing a set of hierarchical categories or folders.
This method tries to place messages into the most specific categories
before attempting to use the more general categories.  The scheme is
suited for use in Electronic Document Processing applications or in
e-mail systems.  The technique differs from most existing automated
schemes in that it can effectively work with hierarchical categories
instead of simply a collection of disjoint categories.  In
particular, this technique places messages in the most specific
(deepest) category available.

   Alternative approaches include doing a "drill down"
analysis through the category hierarchy and doing a "flattening"
analysis by exhaustively considering every category and sub-category
combination available.  The former approach is slow when processing a
deep category hierarchy; the latter approach requires considerable
memory and computational resources for any complex hierarchy.

   The solution involves a interactive "drill up" from the
most specific categories to the least specific.  The algorithm works
as follows:
  1.  Make a list of all of the leaf categories in the category
      hierarchy; each of these leaf categories is associated
      with information about its parent nodes.
  2.  Attempt to place the message in one of the categories in
      th...