Method and System of Block Printing for Markup Page Disclosure Number: IPCOM000198968D
Publication Date: 2010-Aug-19
Document File: 2 page(s) / 122K

Printing is almost the most frequent operation when people use the computer in daily life. Whenever people want to read a document carefully, or they could not and are not willing to read the document on front of the computer, they always choose to print the document out. Nowadays, it is very easy and fast for people to use a printer. Actually, in most offices, the printers are basically connected in a intranet, and the user could submit their printing request on their own computers, and then take their printed materials in a few minutes. In fact, almost all popular software, related to document processing and text edition, provide the functions of document printing. The user could simply issue the shortcut key with Ctrl+P to accomplishing the printing task. Printing has become a fundamental feature in these popular software. Problem Solved: Technically speaking, the existing printing feature in the popular software, such as Word, Adobe Reader, Lotus Notes and Ultraedit, has been designed and accomplished in the way of “WYSIWYG”, namely “What You See Is What You Get”. Thus, for some popular document type, such as DOC, PDF and TXT, the user could always get a well-formatted print page. The page printed by the printer looks exactly the same as the page edited on the computer. However, such a situation does not happen for a special kind of page, markup page. The most popular markup page is just HTML-based or XHTML-based web page. For web page, there are some predefined markups to define the location and display of some elements in web pages. Unlike those traditional formatted document, web page is often composed of many diverse content. Besides main content, there are also many other types of content in web page, such as advertising and logo ect. For web pages, it is often the case that people only pay attention to their main content. But when they try to print a web pages, the two situations often take place: 1) the content they wanted are not printed out because these content are out of the scope the printer could cover; 2) the layout of printed page are quite different from the original page because some markups in web pages are not supported by softwares or by printers.

Method and System of Block Printing for Markup Page

Main Idea

What led to such a bad situation? In fact,

page defined by markup language does fit for browsing, instead of printing. When a

web page is printed, due to paper size and printer capability

                                       , browsing layout is hardly consistent with printing layout. In this disclosure,

we want to take a layout

conversion for a web page before it is printed, and make it suitable for the printing.

In this disclosure,

page. It mainly consists of two components:
1) Main Content Automatic Extraction. In a web page, there are usually diverse content. Some content focuses on one topic while other content focuses on the other topic. In most cases, the content in a specific topic are displayed in an exclusive block. Besides, in a web page, there are always a main block to display the main content for the browsers. Other blocks, such as advertising and logos etc, are normally displayed at the aside of the whole page. For the users, they often only

. For this

we want to apply the techniques of web page segmentation and block

identification to automatically extract the main block from the web page . Optionally, if there are multiple main blocks in a web page,

we could extract all these blocks for

After the main content is extracted from the web

page for the users to select to print, a subsequent content is how to print the main content. Recall that the main block is now displayed in the web page only for the users' browsing. The original layout of the main block might be quite unsuitable to be printed. We need to re-layout the content in the main block according to the size of the paper. It should be noted that there are some complexes situations in a main block. For example, some pictures are embedded in the block, and some advertising are also embedded in the block. Thus,

we need to further filter those irrelevant

content and then layout the relevant content adaptively. We could adopt two techniques to achieve this goal:
2.1 Content Topic Shift Judgment. For the large main block, there...