Altering Scanned Documents for Redaction, Masking and other Edit Operations Disclosure Number: IPCOM000234883D
Publication Date: 2014-Feb-12
Document File: 6 page(s) / 340K

This disclosure provides an innovative method for creating a seamless altered copy of scanned textual documents, and more generally, textual documents in various raster graphics formats.

Masking and other Edit Operations

Existing redaction solutions, such as InfoSphere Guardium Data Redaction and other redaction/masking tools, can create an altered copy of scanned documents, replacing text by drawing texts or shapes, but the changes are quite evident. These solutions cannot alter a document in such a way that preserves its original visual properties.

    A masking system implemented with our invention will be able to alter textual documents in raster formats, replacing text while preserving the original visual properties of the document, including the visual properties of the replaced text. This new capability will fulfill requirements which the inventors have learned while working on the Redaction product, and so open new business opportunities where a redacted/masked copy must look like the original. This is particularly relevant for test data management products such as IBM Optim Data Privacy Providers.

    In the existing art, some advanced OCR systems such can convert raster documents into textual formats and thus make an editable copy, in which text can be replaced.

This approach has some significant limitations:

 It introduces OCR errors

 It is limited to a set of supported fonts.

 It doesn't preserve visual attributes of the original documents. (The process used by these systems removes background images, removes noise, de-skews, etc., thus altering the visual appearance of the document.)

The disclosed approach offers a practical solution free from the above-mentioned limitations.

    With existing editing tools such as Adobe's* Editor, text replacement in scanned documents fails to maintain the look-and-feel op the original . Mostly, the newly generated replacement text visually stands out from the original text. This makes the it obvious to the reader that the copy underwent a replacement operation, which affects

the way one interacts with and treat the information.

    In order to preserve the look and feel, the new text not only needs to match font face, colors, and size as well as background, but also interlace with visual nuances, e.g. fit with partially defective look of scanned results.

    A system implemented according to our disclosure will be able to do an automated or semi-automated masking of scanned documents. The unique feature of this system is an ability to mask documents in raster formats such that masking text inherit visual attributes of the masked text.

    There is prior art in some related areas, but none that addresses the core functionality in our disclosure.

In the area of Optical Character Recognition, many patents show how to extract text from image, for example "Method of recognizing text information from a vector/raster image," two patent applications: US 20100254606 A1 and US 20070133029 A1 . Our disclosure makes use of OCR technology but adds key functionality for seamlessly replacing text with other text, pre...