Browse Prior Art Database

A data compression and formatting method for numeric data exhibiting sequential locality, like Java source code line number mapping

IP.com Disclosure Number: IPCOM000029722D
Original Publication Date: 2004-Jul-09
Included in the Prior Art Database: 2004-Jul-09
Document File: 2 page(s) / 24K

Publishing Venue

IBM

Abstract

A system is described for simply and efficiently representing a sequence of numbers in an XML file when the number sequence displays some locality. This can be used to express a mapping from instructions in a program to the corresponding line numbers in a program source file.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 1 of 2

A data compression and formatting method for numeric data exhibiting sequential locality, like Java source code line number mapping

Disclosed is a system for representing a mapping table that maps block numbers to line numbers in a source file. First, a program is divided into methods; then, each method is divided into blocks. Each block is numbered sequentially, and each block might or might not correspond to a line number in a source file.

The design goals for this system included these:

It should use a plain-text encoding It should be reasonably simple to encode and decode It should have reasonable compactness without disrupting its simplicity

The reason for the first requirement is primarily that the resulting strings are to be transmitted using an XML data stream.

Significantly, the sequence of source line annotations displays sequential locality: from one block to the next, the difference in source lines is likely to be a small, positive number. This factor is used to achieve some data compression.

Here is an example of a String value using this system:

#51+1201#75+11,41

Here is how to interpret the string:
* A number sign ("#") means that the digits that follow form a complete line number, in this case fifty-one. This is the line number corresponding to the first block in the first method in the class.

* A plus sign ("+") means that what follows is interpreted as one or more single digits. Each digit represents the number of lines to add to the previous line number to get the line number for the next block. In this case, the previous line number was 51, and the next digit is "1." This means you add one to 51, so the line number for the second block is 52. The remaining digits in the sequence are "2," then "0," then "1." This indicates that you add two to 52 to get 54, then zero to get 54 again, then one to get 55. This is telling you that the 2nd through 5th blocks correspond to lines 52, 54, 54, and 55.

* The next item in the string is another number sign. This means the following digits are a new compete line number. When the span from the previous line number is negative or greater than nine, this form is used. In this case, "#75" means the 6th block corresponds to line 75.

* Again, the plus sign introduces a string of single digits for adding. In this case there are two of them, "1" and "1," so the 7th and 8th blocks correspond...