Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Self-Optimizing SAX Parser for XML

IP.com Disclosure Number: IPCOM000033506D
Original Publication Date: 2004-Dec-13
Included in the Prior Art Database: 2004-Dec-13
Document File: 2 page(s) / 30K

Publishing Venue

IBM

Abstract

This article provides a method to efficiently parse the element types in an XML file using a SAX parser*. By providing an abstract layer to encapsulate the comparison logic for each element type, it can dynamically optimize the order in which XML elements are compared based on the frequency. As a result, it optimizes the comparisons for SAX parser. *A SAX parser provides read-only, sequential access to XML data and is much more memory efficient than the DOM parser, which allows one to navigate the XML data in tree form.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Self-Optimizing SAX Parser for XML

Background

When parsing XML using a SAX parser, an application developer will typically extend a DefaultHandler and override the startElement(...) and endElement(...) methods. In these methods the developer defines the logic that should occur depending on what element is being parsed. The startElement and endElement methods have several arguments, one of which is the element name. The developer will use an if-else structure to compare this element name with known names to determine which it is and then perform the proper logic. Thus, in these two methods the developer has something like this:

if (elementName.equals(DEFINED_ELEMENT_1) {

// Do something } else if (elementName.equals(DEFINED_ELEMENT_2) { // Do something else }
... else {

// throw exception for unexpected element }

Problem

In an XML document with many element types this comparison structure can be very large and lead to many string comparisons, which makes the XML processing expensive. One way to try and optimize this procedure is to test for the elements that will occur most often. However, it is often difficult for the developer to know which element will occur most often and over time the distribution of elements may change. As a result, there is no good way to optimize the comparison of element names in a DefaultHandler.

Solution

This invention provides a layer of abstraction that allows the developer to encapsulate the logic that belongs to each element and register it with the parser, which dynamically changes the order of comparison. It provides a method for the parser to track the frequency with which elements are being encountered and use that information to optimize the order in which elements are compared. As a result, the effort spent comparing is greatly reduced. It works by providing another layer of abstraction to ...