Browse Prior Art Database

Leading index coding for recursive data structures offering performance advantages over XML and ASN.1

IP.com Disclosure Number: IPCOM000028259D
Original Publication Date: 2004-May-06
Included in the Prior Art Database: 2004-May-06
Document File: 2 page(s) / 66K

Publishing Venue

IBM

Abstract

Disclosed is a simple coding for data structures that allows fast access to items contained within it by number. The items accessed are similarly coded, allowing fast access to numbered items therein, and so on. Such nested data structures are known as "recursive data structures". Well-known codings XML and ASN.1 BER have drawbacks. A program accessing XML data must examine every byte of the structure preceding the desired item. One accessing BER, a complex coding, may need to examine many or all preceding data items in the structure preceding the desired item.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 2

Leading index coding for recursive data structures offering performance advantages over XML and ASN.1

XML is notoriously slow to process, and can require large amounts of memory for processing algorithms. ASN.1 encodings such as BER are generally perceived as very complicated.

    The disclosed coding consists of two parts. The first part is an index which indicates the position and type of the corresponding item. The second part is the direct concatenation of the data items - without any control information such as type codes or lengths.

    A program to access an item using this invention can access the index (perhaps by reading from a file, or by receiving it over a network link). It can then immediately locate the desired item. If that item is itself a structure, and the program requires an item within its contents, it can repeat this process by accessing its index, and so on. Ultimately, it locates the data containing the desired item, at which point it may access them directly.

    The program needs to only access a few index entries in order to locate the data. In contrast XML requires accessing every single byte in the data stream up to and including the desired item, because it uses embedded trigger characters ("<" and ">") to mark out the structure. ASN.1 BER is closer to the disclosed coding, but instead of a separate index, each item is immediately preceded by its length. This allows a program to skip irrelevant items, but it must access the length field of every item preceding the desired one. These length fields are scattered through the data structure, so many I/O operations may be required.

    The disclosed coding stores the index and the data as two strings of bytes concatenated together. The index entries are binary nu...