Browse Prior Art Database

System and method for interested paragraphs auto-extraction collaboratively based on unintentional behavior

IP.com Disclosure Number: IPCOM000185381D
Original Publication Date: 2009-Jul-23
Included in the Prior Art Database: 2009-Jul-23
Document File: 8 page(s) / 95K

Publishing Venue

IBM

Abstract

In this disclosed publication, a system and method to extract interested paragraphs of a document automatically by monitoring a group of users' unintentional behavior is proposed. Advantage of the invention is non-obtrusive and it doesn't require user to do the conventional laborious tagging work.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 30% of the total text.

Page 1 of 8

System and method for interested paragraphs auto -

-extraction collaboratively based on unintentional behavior

extraction collaboratively based on unintentional behavior

Disclosed is a system and method to extract interested paragraphs of a document automatically by monitoring a group of users' unintentional behavior.

When a user surfs the web to get his interested information, typically only some specific paragraph inthe browsing document will attract most users' attention, because they give users the answer more clearly. However, currently when users browse to get the information, they may need to read the full document paragraph by paragraph to get their interested one, the same for other users who are trying to get the same information.

Methods already known to help user get the target information include:

(1) Use some keyword search on the a web page to get the target information;

(2) Read paragraph by paragraph to get the interested information;

(3) With the popularity of Web 2.0, allow users tag the a target paragraph.

However, there are several drawbacks for each of these known methods:

(1) For keyword search on the web page: there could be lots of hits on the same web page, the resultof using this method can be the same as the second method;

(2) For read paragraph by paragraph: it could bevery time-consuming to get the target information;

(3) For paragraph tagging: it is also laborious to do the tagging work and user is reluctant to do so.

An observation from user's normal behavior of web surfing that the target information has already beenfound by many other users when searching for the interested information or the answer to a question, triggers authors of this disclosure to use this information to give users more targeted part of a document. The essential idea of this disclosure is to make use of users' unintentional behavior to extract interested paragraphs of a document. The advantage of the proposed methodis obvious, that is by making usage of users' unintentional behavior, users' interests are collected collaboratively and incorporated into theworld of web documents without further modification to current IT architecture.

A brief summary of the method in the present disclosure is: monitor users' behavior on a targeted page, then calculate scores for specific paragraphs of a web page, thirdly by summing all users' paragraph score, a certain amount of paragraphs which attract users most can be obtained, and fourthly applications can use this information in a variety of ways.

Before elaboration of the disclosed method, the definition of "paragraph" in this disclosure should beclarified first: if a natural paragraph in a

1

Page 2 of 8

web page is less than a certain length, say 100 words, we will treat it as one paragraph. If the length of a natural paragraph is more than 100 words, then we will spli...