Browse Prior Art Database

Multi-Layer XPath Navigation

IP.com Disclosure Number: IPCOM000201645D
Publication Date: 2010-Nov-17
Document File: 6 page(s) / 81K

Publishing Venue

The IP.com Prior Art Database

Abstract

This disclosure extends the XPath syntax and parser to support content that could not previously be navigated. The disclosure requires additions to the XML parser but does this in such a way that no changes are required to the XPath grammer rules. The particular data handled by the different parsers is associated with different content types (HTML, XML etc.), which is specified via annotation in xml document. The disclosure allows either an XPath expression or using additional parser-related metadata to navigate through embedded content by offloading parsing responsibility to a parser that understands each content type. The disclosure also allows multiple-layers of parsers being used when necessary.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 40% of the total text.

Page 01 of 6

Multi-Layer XPath Navigation

The field of the solution herein is XPath expression evaluation.

    XPath, the XML Path Language, is a query language for selecting nodes from an XML document. In addition, XPath may be used to compute values (e.g., strings, numbers, or Boolean values) from the content of an XML document. XPath was defined by the World Wide Web Consortium (W3C).

    The XPath language is based on a tree representation of the XML document, and provides the ability to navigate around the tree, selecting nodes by a variety of criteria. In popular use (though not in the official specification), an XPath expression is often referred to simply as an XPath. Originally motivated by a desire to provide a common syntax and behaviour model between XPointer and XSLT, subsets of the XPath query language are used in other W3C specifications such as XML Schema.

    It is increasingly common to use XML documents to transport many different types of data. In the example shown below, the XML fragment contains regular XML elements and also some HTML content embedded in the helpText element. XML entities in the HTML have to be encoded so that the XML parser does not get confused (for example, the & character is XML encoded as &).

param1

ref1

true

name

Default parameter value

false

true

true

1

1

xsd:string


Make sure embedded elements are preserver! <p>The first parameter!</p>

More <span>stuff</span> afterwards!

    Unlike XHTML, HTML is not necessarily well formed and so cannot be parsed by an XML parser. It therefore has to be encoded effectively as a BLOB (Binary Large OBject). Because the data is not structured in a way that is visible to the XML parser, XPath expressions cannot navigate into the data. XPath expressions can go as far as the helpText element but no further. This can be a major limitation for XML based applications.

    Another example payload in an XML document contains Java source code (see below). Like the previous example, the code element contains content with XML entities escaped (' and "). The Java source code is treated by the XML parser as unstructured text content.


Page 02 of 6

param1

ref1

true

name

Default parameter value

false

true

true

1

1

xsd:string

import org.

import org.

public class EvaluatorTests implements ExpressionProvider {

@Test

public void testEvaluator() throws StopException { try {

PathManager pathManager = PathManager.getManager();

PathEvaluator evaluator = pathManager.createPathEvaluator();

evaluator.setPatterExpressionProvider (this);

evaluator.setPathProvider(this);

Assert.assertEquals("e;1 + 2 + 3 "));

evaluator.evaluate("pp:getValue('foo;apo;) ")); }catch (Exception exception) {
throw new StopException(exception.getMessage());

}

} }

    Another area where the XML parser hits a limit is around character encodings and binary data. An XML parser supports many different character encodings, however the whole document must be in the same encoding (typically UTF-8). The XML declaration at the top of the document specifies this encoding. In some c...