Browse Prior Art Database

Lazy parser Disclosure Number: IPCOM000013002D
Original Publication Date: 2000-Jun-01
Included in the Prior Art Database: 2003-Jun-12

Publishing Venue



In a flexible message processing system , or the XML DOM interface, present information is parsed from a message to an application on an element by element basis. In order to retrieve each element, certain parts of the message may be parsed. In standard technology, on the first request the entire message is parsed into an internal tree structure, and the requested element is passed to the application via a programming interface. Subsequent requests can be satisfied directly from the parse tree without further parsing. However, the initial request takes much longer than is necessary since the entire message must be parsed. We propose a 'lazy' parser that parses just as much of the message as is necessary to satisfy each request. It holds a tree for the information parsed so far, and uses this to satisfy requests where possible. Thus consider an XML or other string delimited structure holding elements E1, E2, ..., E10. When a request is made for E3, E3 can only be found by parsing E1 and E2. A simple string search for '' is not adequate as this will not cater for the possibility of a 'lower level' E3 element embedded in an earlier element. The parser then has a parse tree holding E1, E2 and E3. The parse is suspended, and an indication of the location in the string where the parse was suspended is held. The parse is now able to return E3. A subsequent request for E1 can be immediately satisfied from the tree. A subsequent request for E5 requires that the parser restart the parse from the suspended point. The tree will now be extended to include E4 and E5, and a new suspended point recorded. Where the element retrieved is not at the top level of the tree (e.g. E4.X3), the parse will be suspended in 'mid structure'. The tree and suspension point still hold the necessary information for the parser to resume. Where later elements (E6, ..., E10) are never requested, they never need parsing. This may often be the case in a message broker where elements early in the message are used for content based routing. The details of the remaining message elements are not of interest to the broker.