Browse Prior Art Database

System for paginating markup in the absence of direct feedback from an output device context

IP.com Disclosure Number: IPCOM000013934D
Original Publication Date: 2001-Oct-03
Included in the Prior Art Database: 2003-Jun-19
Document File: 5 page(s) / 54K

Publishing Venue

IBM

Abstract

Abstract This document describes an algorithm for paginating markup in the absence of direct feedback from an output context. The algorithm utilizes a simplified numerical model of the behaviour of downstream output transformations to make upstream pagination decisions. The benefits of performing upstrream pagination of the input is that the downstream transform stages can be simplified so that they focus only on the task of laying out page-sized chunks of output without having to consider the complicated issues of trying to span content across multiple pages. Context With the advent of markup languages such as XML and HTML, there have been strong incentives to generate various kinds of application output as XML documents and then, separately, to transform such XML into a presentation format such as HTML. The advantages of this 2-phase approach to output generation are as follows: the device-independent XML can be used as input to other processes unrelated to presentation for example: analysis processing by substituting different presentation transforms, the device-independent XML can be retargeted to different kinds of device context Reporting outputs are a type of application output that are apparently well-suited to this approach, particularly since the intermediate XML representation is likely to have a number of different uses aside from presentation. One difficulty, however, is introduced when there is a requirement to paginate the final output, particularly when HTML and CSS are used as the presentation languages. The problem is that HTML+CSS only provide very limited support for output pagination. The support they do provide is a mechanism for signalling to a printing context where a page break should be forced. However, it remains the responsibility of the generator of the HTML to ensure that the generated content between two page break directives will fit within the confines of a typical page. This means the HTML generator must account for the depth of output thus far generated and, when necessary, generate a page footer, a page break directive and a subsequent page header then continue processing the input in the same fashion until the end of input has been reached. It is the need for the HTML generator to account for the depth of generated output that makes this problem a reasonably tricky one to solve. In fact, in the general case, it can only be solved with 100% accuracy if the HTML generator has a 100% accurate model of the output processor (usually a web browser) that performs the write into the final output context. Such complete accuracy is an unreasonable requirement to put onto an HTML generator. However, provided one is willing to make certain assumptions about the configuration of the output processor, restrict the nature of the generated HTML and so trade model accuracy for simplicity, it is possible to predict the behaviour of the output processor with sufficient accuracy to generate HTML that will be paginate correctly when eventually written to the output context. The remaining problem is how to incorporate the pagination model into the HTML generator. If the HTML generator is implemented as a JSP page, it is possible to use nested loops to iterate across the input document, using the pagination model to account for the depth of the generated HTML as it is written into the output stream. This is relatively simple to do with JSP, since JSP pages have access to the full power of the Java language to implement the accumulators necessary to account for page depth. Such accounting is not as easy to implement with XSLT transforms, since XSLT by itself doesn't provide for expressions with side-effects that are necessary to implement the accumulators that are required to keep track of the currently generated page depth. Another way to approach this problem is to split the HTML generator into 2 smaller transforms: an XML Paginator and Page Oriented HTML Transform. The XML Paginator applies assumptions about the behaviour of the Page Oriented HTML transform and the output processor to the XML input in order to break the original XML into page size chunks of XML, leaving the Page Oriented HTML Transform to the simpler task of formatting page size chunks of XML into HTML. This process can be represented by the follow schematic:

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 22% of the total text.

Page 1 of 5

  System for paginating markup in the absence of direct feedback from an output device context

Abstract

This document describes an algorithm for paginating markup in the absence of direct feedback from an output context. The algorithm utilizes a simplified numerical model of the behaviour of downstream output transformations to make upstream pagination decisions. The benefits of performing upstrream pagination of the input is that the downstream transform stages can be simplified so that they focus only on the task of laying out page-sized chunks of output without having to consider the complicated issues of trying to span content across multiple pages.

Context

With the advent of markup languages such as XML and HTML, there have been strong incentives to generate various kinds of application output as XML documents and then, separately, to transform such XML into a presentation format such as HTML. The advantages of this 2-phase approach to output generation are as follows: the device-independent XML can be used as input to other processes unrelated to presentation [ for example: analysis processing ] by substituting different presentation transforms, the device-independent XML can be retargeted to different kinds of device context Reporting outputs are a type of application output that are apparently well-suited to this approach, particularly since the intermediate XML representation is likely to have a number of different uses aside from presentation. One difficulty, however, is introduced when there is a requirement to paginate the final output, particularly when HTML and CSS are used as the presentation languages. The problem is that HTML+CSS only provide very limited support for output pagination. The support they do provide is a mechanism for signalling to a printing context where a page break should be forced. However, it remains the responsibility of the generator of the HTML to ensure that the generated content between two page break directives will fit within the confines of a typical page. This means the HTML generator must account for the depth of output thus far generated and, when necessary, generate a page footer, a page break directive and a subsequent page header then continue processing the input in the same fashion until the end of input has been reached. It is the need for the HTML generator to account for the depth of generated output that makes this problem a reasonably tricky one to solve. In fact, in the general case, it can only be solved with 100% accuracy if the HTML generator has a 100% accurate model of the output processor (usually a web browser) that performs the write into the final output context. Such complete accuracy is an unreasonable requirement to put onto an HTML generator. However, provided one is willing to make certain assumptions about the configuration of the output processor, restrict the nature of the generated HTML and so trade model accuracy for simplicity, it is possible to predict t...