Browse Prior Art Database

Dynamic repair of malformed XML

IP.com Disclosure Number: IPCOM000192702D
Original Publication Date: 2010-Jan-29
Included in the Prior Art Database: 2010-Jan-29

Publishing Venue

IBM

Abstract

We propose a system which can automatically repair malformed XML documents.We propose that an intelligent heuristic is used to infer a schema from the well formed data in the document. This is then used to generate one or many fixes to the malformed section(s) of the XML file. The file can then be further processed and some value gained from it. In the case where there are a number of fixes to produce a valid document, many possible valid documents could be returned. An inferred schema will be arrived at by means of analysis of the tree structure of the tags in the well formed parts of the document. Such an algorithm could log the frequency and type of sub-tags (and attributes) and use this to judge the probability of a tag existing in the bad block.