Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Method for finding and ranking related information using a self determining ontological classification framework in cartesian space

IP.com Disclosure Number: IPCOM000132176D
Original Publication Date: 2005-Dec-05
Included in the Prior Art Database: 2005-Dec-05
Document File: 4 page(s) / 112K

Publishing Venue

IBM

Abstract

Disclosed is an efficient method for discovering the set of documents that are semantically related to a first document. This method uses a structured representation of an ontology to restrict the number of documents to be inspected for semantic similarity.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 40% of the total text.

Page 1 of 4

Method for finding and ranking related information using a self determining ontological classification framework in cartesian space

The classification of a set of documents using subject terms from a controlled vocabulary is well-known.

    We can define a classification space on the basis of the subjects that are defined by the controlled vocabulary and the document set that is about those subjects.

    For example, in the diagram below, there are five subjects in the controlled vocabulary and three documents that have been classified as being about one or more of those subjects. Each line between a document and a subject represents a classification of that document. In principle, the closeness of a document to a subject indicates the "strength" of the classification.

    Of itself, the classification space does not greatly aid the discovery of documents that are semantically related to any given document. For example, the documents on the right and left in the diagram above might appear to be equally relevant to the document in the middle because they each share two classifications.

    Methods are known for finding related documents on the basis of common keywords and are deployed by the Google search engine, for example. The use of a controlled vocabulary will enable greater precision in such keyword-based searches, but individual keywords are not necessarily a valid proxy for meaning.

    In this next diagram, the classification space is extended to include the ontological relationships between the subjects. We refer to this extended space as an Ontology Framework .

[This page contains 1 picture or other non-text object]

Page 2 of 4

    The dotted lines represent the ontological links between subjects. The smaller the number of ontological links between any pair of subjects, the more closely related those two subjects are. Accordingly, the document on the left would be more relevant to the central document than the document on the right.

    Discovery methods that take account of ontological relationships are known. Such methods exploit the relational links between subject and object, but do so by exhaustively following each link to achieve a relevance ranking.

    Our method places the documents in a structural relationship with the ontology, enabling the ontological relationships to determine the relative relevance of the documents that are 'about' the subjects in the ontology. The structure also enables the selection of semantically-related documents to be made from a potentially small subset of the overall information space.

Advantages

    Our method provides for the dynamic discovery of documents that are not merely related by common keywords, but that are linked semantically.

    Our method enables the discovery of related documents to be both more efficient and more effective.

    The efficiency gains arise because only a small subset of the overall information space needs to be inspected. Known methods compare all documents in the space, thus requiring signific...