Browse Prior Art Database

Automatic labeling of layout elements in fixed format document by looking into original flow format document

IP.com Disclosure Number: IPCOM000234733D
Original Publication Date: 2014-Jan-31
Included in the Prior Art Database: 2014-Jan-31
Document File: 4 page(s) / 140K

Publishing Venue

Microsoft

Related People

Dragan Slaveski: INVENTOR [+2]

Abstract

This defensive publication describes how the labeling of layout elements in fixed format document can be automated, in those cases when the fixed format documents is produced from the existing flow format document. Fixed format documents contain no information about the layout elements. This represents a problem when testing the accuracy of the conversion from the fixed format to the flow format document. For testing purposes, layout elements need to be manually labeled, and accuracy test runs are then comparing the labeled data to the results of the conversion. However, the labeling process can be automated, if the original flow format document exists (from which fixed format document is produced). This is accomplished by producing a fixed format document that carrys the information about original flow layout elements, coded inside its native elements. In the described method, the text color is used to code the information about the original layout in the flow document. Based on the content color, it is determined which native fixed format elements belong to which layout elements from the flow document, and the automatic labeling is performed. For the automatic labeling to be performed using the explained method, it is necessary to have the original flow document available. In cases when the original flow format document doesn’t exist this method cannot be used.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 33% of the total text.

Document Author (alias)

Dragan Slaveski (dragas)

Defensive Publication Title 

Automatic labeling of layout elements in fixed format document by looking into original flow format document

Name(s) of All Contributors

Marija Antic (maanti)

 

 

 

 

Summary of the Defensive Publication/Abstract

This defensive publication describes how the labeling of layout elements in fixed format document can be automated, in those cases when the fixed format documents is produced from the existing flow format document.

Fixed format documents contain no information about the layout elements. This represents a problem when testing the accuracy of the conversion from the fixed format to the flow format document. For testing purposes, layout elements need to be manually labeled, and accuracy test runs are then comparing the labeled data to the results of the conversion. However, the labeling process can be automated, if the original flow format document exists (from which fixed format document is produced). This is accomplished by producing  a fixed format document that carrys the information about original flow layout elements, coded inside its native elements. In the described method, the text color is used to code the information about the original layout in the flow document. Based on the content color, it is determined which native fixed format elements belong to which layout elements from the flow document, and the automatic labeling is performed. For the automatic labeling to be performed using the explained method, it is necessary to have the original flow document available. In cases when the original flow format document doesn’t exist this method cannot be used.

Description:  Include architectural diagrams and system level data flow diagrams if: 1) they have already been prepared or 2) they are needed to enable another developer to implement your defensive publication. Target 1-2 pages, and not more than 5 pages.  

  The process of conversion of the fixed format document to the flow format document can be split into two stages: (1) layout elements detection, and (2) element properties reconstruction. In order to fully test the accuracy of the conversion, it is necessary to test the results of both conversion steps. Detection of layout elements represents the process, during which the elements from the fixed format document are aggregated into the layout elements of the flow format document. First, the native fixed format document elements are aggregated into the basic layout elements of target flow format document. These basic layout elements are then aggregated into the more complex layout elements. This process is then repeated, according to the hierarchy of the layout elements in the flow document format, and lower level complex layout elements are aggregated into higher level complex layout elements. For example, a PDF document is composed of absolutely positioned text runs, images and vector graphics. On the other hand, a DOCX document is composed of lay...