Browse Prior Art Database

Method for Automatic Extraction of Relevant Sentences From Texts

IP.com Disclosure Number: IPCOM000102295D
Original Publication Date: 1990-Nov-01
Included in the Prior Art Database: 2005-Mar-17
Document File: 2 page(s) / 64K

Publishing Venue

IBM

Related People

Antonacci, F: AUTHOR [+3]

Abstract

This article describes a method for automatically extracting from a text in any language the most significant and explicative sentences. All the words of the text are analyzed and listed according to an order of importance. Only useful words are considered excluding "stop words" such as articles, pronouns, prepositions, etc. The importance of selected words is evaluated according to their frequency and position within the text. The words with high frequency and located in the title, introduction or conclusion are considered of increased importance.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 58% of the total text.

Method for Automatic Extraction of Relevant Sentences From Texts

       This article describes a method for automatically
extracting from a text in any language the most significant and
explicative sentences.  All the words of the text are analyzed and
listed according to an order of importance. Only useful words are
considered excluding "stop words" such as articles, pronouns,
prepositions, etc. The importance of selected words is evaluated
according to their frequency and position within the text. The words
with high frequency and located in the title, introduction or
conclusion are considered of increased importance.

      With reference to the figure, the method is performed in two
steps: indexing and extracting. During indexing all synonyms of a
word are given by a single base form (lemma). The criteria for
evaluating word importance are:
-    Morphological analysis: the selected words are exclusively
nouns, adjectives and a few verbal forms. The inflected forms are
traced back to the lemma.
-    Statistical relevance: the frequency of the words in the text is
calculated.
-    Synonym check: lemmata with similar significance are considered
as being only one.
-    Stop words: a list of the most common and non-significant words
is used. Such a list can be changed by the user.
-    Topological relevance: if a lemma appears in the title,
introduction or conclusion of the text, a particular importance is
assigned.

      To carry out the morpho...