Publication Date: 2014-Mar-26
A technique that enables us to interactively view the structural segments scored by their intent in a document

In today's era, there is a huge amount of information overload on the world wide web in the form of webpages and documents. There are various kinds of content laid out on a variety of template designs using html markup language for browsers to present the page to the end user. Almost every web page has a unique design of its own to present the various sections of the web page. Besides the main article/content of a particular web page, there are other kinds of articles that are either advertisements, references or even just plain headers that are included in the same web page design and presented to the user.

Few of the examples where other kinds of content besides the main article is included in the same web page are as follows:

While the main article of the web page is marked with a blue box in the above picture, the same page also shown other sections that are present in the same web page design.

Each of these sections inside the web page which is a meaningful chunk in its own can be treated as a segment inside the html page. For example, the headline and the body of the news article marked with a thick blue line in the above picture are two segments on the page. Besides these, we also have a video segment, another segment that contains links to other information sources such as Twitter, LinkedIn and Facebook. Also, in addition to these, we have a segment that contains references to other web pages in the same website, and so on.

Now, let's say that each segment has an intent score associated with it. This score for an identified segment reflects the significance, or in other words, a measure of relevance to the current html page. This score can be normalized to be in the range 0-100 for any segment in the page. In today's systems, when web pages are presented to the end user for viewing, all the content of the web page is treated with equal amount of important when rendering it in the browser. The browsers today do not have an indication of the relevance of any particular segment of a html page that has various sections inside it. Hence, there is a need for an intelligent visualization that enables the end user to focus more on those segments in the html page that are more relevant to the page and carry a higher level of significance to it.

We term this relevance/significance score as an "intent score" because we believe that there is always a level of intention associated with the designing of a segment inside a html page. For example, the intent


score of a headline and the main news article in the above html page design is higher than any other

segment because the intention that was kept in mind by the author while designing the html page was more associated with this main article than any other segment.

We are proposing a visualization technique to view the html page segments by their intent score and can enable the user to focus on selected segments on the html page...