How to Perform String Diffs in Java

How to perform string Diffs in Java?

This library seems to do the trick: google-diff-match-patch. It can create a patch string from differences and allow to reapply the patch.

edit: Another solution might be to https://code.google.com/p/java-diff-utils/

Java simple String diff util

Apache Commons Lang has a class called StringUtils which has both difference and indexOfDifference which fulfills your needs.

http://commons.apache.org/lang/

Check it out

Java: String compare library that returns diff count as an int?

I think what you want is the Leveshtein distance - this tells you how many changes (insertions, deletions or replacements) are required to transform one string to another.

For example, the difference between abcde and abcdef is 1, because you insert f after the last position in abcde to get abcdef.

The difference between abcde and abcdf is also 1, since you replace e in the first string with f to get the second.

The difference between abcde and abde is 1 because you delete c in the first string to get the second.

A very good implementation can be found in Apache Commons Text: LevenshteinDistance.

Here are some sample implementation in Java.

How do I compare strings in Java?

== tests for reference equality (whether they are the same object).

.equals() tests for value equality (whether they contain the same data).

Objects.equals() checks for null before calling .equals() so you don't have to (available as of JDK7, also available in Guava).

Consequently, if you want to test whether two strings have the same value you will probably want to use Objects.equals().

// These two have the same value
new String("test").equals("test") // --> true

// ... but they are not the same object
new String("test") == "test" // --> false

// ... neither are these
new String("test") == new String("test") // --> false

// ... but these are because literals are interned by
// the compiler and thus refer to the same object
"test" == "test" // --> true

// ... string literals are concatenated by the compiler
// and the results are interned.
"test" == "te" + "st" // --> true

// ... but you should really just call Objects.equals()
Objects.equals("test", new String("test")) // --> true
Objects.equals(null, "test") // --> false
Objects.equals(null, null) // --> true

You almost always want to use Objects.equals(). In the rare situation where you know you're dealing with interned strings, you can use ==.

From JLS 3.10.5. String Literals:

Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.

Similar examples can also be found in JLS 3.10.5-1.

Other Methods To Consider

String.equalsIgnoreCase() value equality that ignores case. Beware, however, that this method can have unexpected results in various locale-related cases, see this question.

String.contentEquals() compares the content of the String with the content of any CharSequence (available since Java 1.5). Saves you from having to turn your StringBuffer, etc into a String before doing the equality comparison, but leaves the null checking to you.

Java library for free-text diff

This one might be good Diff Match Patch.

Java comparing two text files and writing diffs

How to track differences depends on the requirements, e.g. whether inserted or deleted files should be detected as well. Since you are a beginner the requirements probably are quite simple, i.e. if we have files like ABC and ACD (each character representing a line) the differences probably should be line 2: B<>C and line 3: C<>D.

To gather those differences you could do the following (I'll omit some stuff to keep code simple, e.g. proper getters and setters):

Create a class to hold the differences:

class Difference {
private int lineNumber;
private String file1Line;
private String file2Line;

//constructor, getters, setters
}

Then keep a List<Difference> and fill it like this:

String line1;
String line2;
int lineNumber = 0;

List<Difference> differences = new LinkedList<>();

do {
line1 = reader1.readLine();
line2 = reader2.readLine();
lineNumber++; //first line will have number 1

//we've hit the end of file1
if( line1 == null ) {
//if we've not hit the end of file2 yet, we have a difference
if( line2 != null ) {
differences.add(new Difference(lineNumber, line1, line2));
}
}
//if we didn't hit the end of file1 yet we just compare, this will return false if:
// - file2 contains a different line
// - we've hit the end of file2 in which case line2 is null
else if(!line1.equals(line2) {
differences.add(new Difference(lineNumber, line1, line2));
}

//once we've hit the end of either file we'll stop this loop
} while( line1 != null && line2 != null );

//read the remaining lines of both files
//since we hit the end of at least one line in the previous loop, only one of the following loops should, if at all
//if we already hit the end of file1 in the first loop, line1 should be null at this point and we won't enter this loop
while( line1 != null ) {
line1 = reader1.readLine();
lineNumber++;
differences.add(new Difference(lineNumber, line1, null));
}

//if we already hit the end of file2 in the first loop, line2 should be null at this point and we won't enter this loop
while( line2 != null ) {
line2 = reader2.readLine();
lineNumber++;
differences.add(new Difference(lineNumber, null, line2));
}

Now you should have a list of differences that you'd just need to write to a separate file. If that list is empty you didn't find any differences which means the files are equal (at least the lines are, the files might still have different line separators etc. but I guess that's no concern for you atm).

Note that you'd have to handle exceptions, closing the files and readers (see Matt's answer) etc. as well.

Please also note that you could do everything with less code (e.g. only one loop) but since you're a beginner it's often best to write some more code to make the process easier to understand.



Related Topics



Leave a reply



Submit