Browse Prior Art Database

CX XML - Condensed Xpath XML Format

IP.com Disclosure Number: IPCOM000019701D
Original Publication Date: 2003-Sep-25
Included in the Prior Art Database: 2003-Sep-25
Document File: 5 page(s) / 29K

Publishing Venue

IBM

Abstract

A new encoding which renders raw XML documents into a condensed, concise, textual expression using canonical XPath syntax. This encoding expression is useful for any applications that only need to retain and handle the essential naing and content values of raw XML documents.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 37% of the total text.

Page 1 of 5

CX XML - Condensed Xpath XML Format

   During work to define a new standard format for logging SOAP Web services traffic for Web analytics purposes, I discovered a real need for some concise, streamlined expression of SOAP XML messages. After some research and searches I was not able to find any existing encoding which would satisfy all the special requirements that were needed. I developed a new technique to encode a condensed expression of an XML document into a concise, canonical XPath type expression format. This new format reduces all XML schema metadata and only preserves the essential names and values needed for analytics processing. The new CX XML format is the current recommendation for a new proposal for use with Web services logging standard proposal.

Objectives

     Conserve disk space resources in high volume, low footprint scenarios by reducing raw XML textual size

     Should produce a significant (geometric) data compression size reduction of XML messages To reduce processing overhead, a streamlined, light-weight algorithm implementation should only serialize the raw XML once by the

creator and not require a deserialization back to original raw XML

     Preserve only the bare essential content of raw XML messages important and relevant to Web analytics usage processing

     Remove as much tagging, XML schema i.e. datatype metadata, namespaces, encoding style, qualified names,etc as possible. For web

analytics XML schema validation is not important or relevant.

     Retain XML naming (element tree structure, attribute names) structure *(parent, child, siblings), attribute and content values

Allow human readable textual output for human log viewing (this requirement may not be an absolute however) Easy integration into Web services producer and analytics consumer products

Binary CB XML

Note: see Binary CB XML document for further info.

Features

Typed clauses binary encodings

     Objects (qualified names, namespace maps, content strings, ...) are encoded inline on first occurrence and by reference after the first

occurrence. However, strings can be duplicated to avoid a requirement for sequential processing.
(E.g., start over for SOAP body.)

     Both the encoder and decoder / writer and reader components must build identical hash reference tables to emit or reconstruct the original

XML text. This introduces the same performance issue, but twice.

Binary CB XML encoded output contains mixed raw XML for ref-1, ref>1 just has the hash index

1

Page 2 of 5

numerical value

The processing overhead is compensated as the XML input doc size increases

Note: CB XML proved very difficult to integrate with WAS 50x Web services engine only attempted during

testing. For use with Web services logging, CB XML integration would need to occur for service engines

as well as all logging consumer products.

CX XML Syntax

(/elem1[@attr='value',@attr='value']="value";/*/elem2[@attr='value',@attr='value']="value ";...)

Notation: (following XPath element naming)

/elem/... is an abs...