Browse Prior Art Database

Method of identifying differences between two computer language source code files

IP.com Disclosure Number: IPCOM000035619D
Original Publication Date: 2005-Jan-27
Included in the Prior Art Database: 2005-Jan-27
Document File: 3 page(s) / 123K

Publishing Venue

IBM

Abstract

Existing source code "Diff" tools (tools which highlight the differences between two source files) generally work on a row-by-row and character-by-character basis. This mechanism is adequate for identifying all changes to a file as compared to another file (for example a previous version of the same file) when the delta between the two files is small and the changes are distinct and confined to specific areas. However, with the use of tools which re-format, or "beautify" code, the changes to the characters which form the code can be altered without changing the underlying logic of the code. The problem, therefore, is that re-formatting code may result in a conventional Diff tool displaying a large delta between the two files which includes a significant number of false positives (i.e. changes introduced due to code re-formatting rather than programmer introduced code modifications) with regard to actual functional processing changes. This article describes means of reducing the "noise" presented when presenting the differences between two source code files.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 47% of the total text.

Page 1 of 3

Method of identifying differences between two computer language source code files

The challenge is to create a Diff tool which has an understanding of the language it is processing. This would allow the tool to present several options on how to present the differences between two source code files.

    The mechanism to do this would be to first "normalize" the source code i.e. convert the source files to be compared into a "lowest common denominator" state (see below). A comparison would be made against the two normalised files and the differences identified.

    The differences identified would then be presented to the user as highlights shown on the code originally supplied to the Diff tool, the user would then be unaware of the intermediate normalized format being employed "under the covers".

    The types of differences which would be presented would include the "standard" options which do not require deep probing of the code:

Identification of inserted, deleted & substituted characters within a record:

Inserted File 1 System.out.println("The quick brown fox!");

File 2 System.out.println("The quick brown fox jumped over the lazy dog!");

Deleted File 1 System.out.println("The quick brown fox jumped over the lazy dog!");

File 2 System.out.println("The quick brown fox!");

Modified File 1 System.out.println("The quick brown fox jumped over the lazy dog!");

File 2 System.out.println("The quick brown fox JUMPED over the lazy dog!");

Indentification of inserted, deleted & substituted records:

Inserted File 1 System.out.print("The quick brown fox");

Inserted line

System.out.println(" jumped over the lazy dog!");

File 2 System.out.print("The quick brown fox");
int i = 0;

System.out.println(" jumped over the lazy dog!");

Deleted File 1 System.out.print("The quick brown fox");
int i = 0;

System.out.println(" jumped over the lazy dog!");

File 2 System.out.print("The quick brown fox");

Inserted line

System.out.println(" jumped over the lazy dog!");

Modified (effectively equivalent to modified characters, but indicating that almost certainly the whole record had changed)

File 1 System.out.print("The quick brown fox");
int i = 0;

System.out.println(" jumped over the lazy dog!");

File 2 System.out.print("The quick brown fox");

String s = "Inserted";
System.out.println(" jumped over the lazy dog!");

And "enhanced" options, i.e. options which do require the code to normalized:

Page 2 of 3

"if" code processing

Original Modified Original + Highlights Modified + Highlights

...
if (a==b)

executeProcedure();
...

...
if (a==b) {

executeProcedure();
}
...

...
if (a==b)

executeProcedure();
...

No highlights as the code is equivalent despite the presence of "{}" in the modified code

...
if (a==b) {

executeProcedure();
}
...

A highlight is present to indicate the difference between the modified and original code. The highlight is yellow & underscored to show that it is not affecting processing.

...
i
i f f (

( a
a =

= = = b
b )

)

...
i
i f f (

( a
a =

= = = b
b )

) {
{

e
e x x e...