Browse Prior Art Database

Validating and transforming structured documents with repeatable segments in a distributed environment

IP.com Disclosure Number: IPCOM000234867D
Publication Date: 2014-Feb-11
Document File: 5 page(s) / 134K

Publishing Venue

The IP.com Prior Art Database

Abstract

A system and method for validating and transforming structured documents with repeatable segments in a distributed environment is disclosed. Large documents are split it in to multiple smaller documents, validated, and transformed. Each sub document is then merged back to a single output document. While validating each sub document, the core metrics are that are required for the final certification of the document as compliant are captured if required.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 01 of 5

Validating and transforming structured documents with repeatable segments in a distributed environment

Disclosed is a system and method for validating and transforming structured documents with repeatable segments in a distributed environment.

The data validation and transformation technologies available in today's market perform operations
in a sequential order byte by byte when dealing with non standard and standard data documents with repeatable transactions. When validating and parsing large documents this is a performance bottleneck. The disclosed solution allows for processing a single document by truncating the document into sub-documents. The sub-documents are processed in parallel and the results are merged to a single document. This increases the performance when dealing with large documents. This logic can be applied to any document which is structured and has repeatable blocks of information.

As an example, an electronic data interchange (EDI) document contains interchange headers followed by groups or message sets. These groups/message sets can be repeated. When a process starts validating the EDI document with a large number of group/message sets, a master thread/process splits the EDI document (Master document) into sub documents and delegates processing of each sub document to a separate child thread/process. Each Child thread/process is provided with information about the starting offset and the length of the data to be processed.

Before the split occurs, the Master thread validates the Interchange header which has the Trading partner information and determines if the document has groups or message sets. The Master thread then skips directly to a certain offset at end of the document (say last byte - 30) to validate the interchange Trailer section to get access to message set/group count. It is important during validation, to know if the document has groups or message sets, as this information helps during the validation of sub document as described below.

The next step is to split the document in to multiple sub documents. The split can continue until a manageable threshold size is reached. Once the document is split into the manageable size, the document is validated and transformed.

At the Validation and Transformation of sub-documents step, the following are determined:


a. Offset where to start the process


b. length of the sub document


c. if the document contains groups/message sets


d. Chunk Index


f. Unique Identifier of the document

...