Scheme to retain image quality in document with highlight images

IP.com Disclosure Number: IPCOM000244521D
Publication Date: 2015-Dec-17

A method preserves the color highlighted region of a scanned document, without impact to the quality of the underlying text. After the color highlighted region has been extracted using previously disclosed technology, a separate plane within the multi-raster content (MRC) file structure is used to carry the information. This enables the color highlight to be carried with a different compression, without affecting the underlying text. Since the average color of the highlight region is used instead of the raw scanned data, the compression is extremely high. Benefits of the invention include the ability to maintain consistently good image quality of highlighted text from a scanned document without impact to the file size. It also allows the highlighted color to easily be removed, leaving the underlying text. This invention proposes only one mechanism to carry the color highlight content, through Multi-Raster Content (MRC) file format. There are other file format mechanisms, such as PDF fill objects, which could provide the same end result.

In MRC format, currently the highlighted portion in the document will be treated as image layer; also there is a chance of associated text in highlight region to be misclassified as image due to segmentation error. Since in MRC format, image layer undergoes lossy compression this misclassified texts and highlights regions will be appears to be artifact or undesired way to reader in final document.

This behavior affects the purpose of using highlight to draw attention of the user also appeal of the document is degraded.

This disclosure propose an method, in which additional to text and image layer in MRC format, introduces new layer called highlight layer, where the highlighted portion in the document are moved to this new layer. Unlike the image layer which undergoes heavy compression this highlight layer would be treated to maintain the uniformity in the highlighted colors.

In widely used MRC format, scanned document will be segmented into text and image layer, in this text layer undergoes lossless jbig2 compression and image layer undergoes lossy jpeg compression. Currently highlighted portion are segmented as image layer, so uniformity and some light colors in this highlight portion are lost due to the lossy jpeg compression. Generally to draw attention of the reader some texts like keywords, important values, etc. would be highlighted in the document but above behavior questions the purpose of highlight.

This invention proposes a method, in which highlights are extracted, treated in separate layer and performs lossless compression. The highlighted portion in the document can be located and associated texts are segmented by known methods like ID: US 8494280 B2, ID: US 5048109. Extracted highlight regions are moved to a separate layer and the uniformity of colors are evenly distributed to each color patches, such that by picking an average color value in a highlight patch and assigni...