Browse Prior Art Database

Method for checking differences of text files

IP.com Disclosure Number: IPCOM000129122D
Original Publication Date: 2005-Sep-28
Included in the Prior Art Database: 2005-Sep-28
Document File: 2 page(s) / 27K

Publishing Venue

IBM

Abstract

Disclosed is a software design for checking differences of two text files. The checked differences are generated not per line, but per letter. Yet too strict check of differences is, though logically correct, may in case contradict our intuitive understanding. Hence this new design repeats check simulations for several times in order to search for the result as natural to our intuitive view as possible.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 47% of the total text.

Page 1 of 2

Method for checking differences of text files

Disclosed is a software design for checking differences of two text files. The checked differences are generated not per line, but per letter. Yet too strict check of differences is, though logically correct, may in case contradict our intuitive understanding. Hence this new design repeats check simulations for several times in order to search for the result as natural to our intuitive view as possible.

Basic Conception

 The basis of checking difference of two text data is to compare every letter both in old and new data one by one. If the compared letters are the same, the next letters are then compared. If successive comparisons show no difference to the end, two text data are judged to be the same, but if not, there are some differences.

 When a difference is detected, the length of the different part must be determined; in other words, it is necessary to search for the next match between old and new text data.

 It is also needed to ascertain whether the difference is cause by deletion of some letters or by their insertion. The data pattern of fig.1 shows both deletion and insertion of letters.

Fig.1

 In order to ascertain different parts, a scanner in the new design scans both of old and new text strings with double loop. For example in fig.1, the scanner stopping at the letter 'X' in the new data cannot find next match, if it proceeds from the letter 'B' in the old data. Thus, it must proceed to next letter 'D' in the new data, and scans the old data again from 'B'; then it can find 'D' in the old data, which is the next match. The scanner can thus know that 'BC' in the old data is deleted and 'X' in the new data is inserted. The check of difference can be made precise in this way.

Prevention of Accidental Match

 For effective check, the scanner must prevent accidental match during the search for next match. In the following data pattern of fig.2, if the scanner scans them with double loop which starts from the mismatch letter 'B' in the old and 'X' in the new data, and scans the old data with inner loop, it finds 'X' in the old as the next match, so that it judges 'BCDE' in the old as deleted and 'DEXF' in the new as inserted. Although this judgment is logically correct, it contradicts our intuitive understanding, because we see only 'BC' in the old as deleted, and the first 'X' in the new as inserted.

Fig.2

The difficulty can be solved by setting the new criterion for judging next match: to judge the find of match when more than two letters coincide. According to this new criterion, even if the penultimate 'X' in the old data matches the first 'X' in the new, the next match in this case is judged to be 'DE' in that the letter next to the first 'X' does not match.

 In the case of data pattern of fig.3, the scanner, using inner side of the double loop for scanning the old data, finds the first mismatch at 'This is a d' in the old and 'This is a n' in the new. If the criterion length of...