How to perform string Diffs in Java?
This library seems to do the trick: google-diff-match-patch. It can create a patch string from differences and allow to reapply the patch.
edit: Another solution might be to https://code.google.com/p/java-diff-utils/
Java simple String diff util
Apache Commons Lang has a class called StringUtils which has both difference and indexOfDifference which fulfills your needs.
http://commons.apache.org/lang/
Check it out
Java: String compare library that returns diff count as an int?
I think what you want is the Leveshtein distance - this tells you how many changes (insertions, deletions or replacements) are required to transform one string to another.
For example, the difference between abcde
and abcdef
is 1, because you insert f
after the last position in abcde
to get abcdef
.
The difference between abcde
and abcdf
is also 1, since you replace e
in the first string with f
to get the second.
The difference between abcde
and abde
is 1 because you delete c
in the first string to get the second.
A very good implementation can be found in Apache Commons Text: LevenshteinDistance.
Here are some sample implementation in Java.
How do I compare strings in Java?
==
tests for reference equality (whether they are the same object).
.equals()
tests for value equality (whether they contain the same data).
Objects.equals() checks for null
before calling .equals()
so you don't have to (available as of JDK7, also available in Guava).
Consequently, if you want to test whether two strings have the same value you will probably want to use Objects.equals()
.
// These two have the same value
new String("test").equals("test") // --> true
// ... but they are not the same object
new String("test") == "test" // --> false
// ... neither are these
new String("test") == new String("test") // --> false
// ... but these are because literals are interned by
// the compiler and thus refer to the same object
"test" == "test" // --> true
// ... string literals are concatenated by the compiler
// and the results are interned.
"test" == "te" + "st" // --> true
// ... but you should really just call Objects.equals()
Objects.equals("test", new String("test")) // --> true
Objects.equals(null, "test") // --> false
Objects.equals(null, null) // --> true
You almost always want to use Objects.equals()
. In the rare situation where you know you're dealing with interned strings, you can use ==
.
From JLS 3.10.5. String Literals:
Moreover, a string literal always refers to the same instance of class
String
. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the methodString.intern
.
Similar examples can also be found in JLS 3.10.5-1.
Other Methods To Consider
String.equalsIgnoreCase() value equality that ignores case. Beware, however, that this method can have unexpected results in various locale-related cases, see this question.
String.contentEquals() compares the content of the String
with the content of any CharSequence
(available since Java 1.5). Saves you from having to turn your StringBuffer, etc into a String before doing the equality comparison, but leaves the null checking to you.
Java library for free-text diff
This one might be good Diff Match Patch.
Java comparing two text files and writing diffs
How to track differences depends on the requirements, e.g. whether inserted or deleted files should be detected as well. Since you are a beginner the requirements probably are quite simple, i.e. if we have files like ABC
and ACD
(each character representing a line) the differences probably should be line 2: B<>C
and line 3: C<>D
.
To gather those differences you could do the following (I'll omit some stuff to keep code simple, e.g. proper getters and setters):
Create a class to hold the differences:
class Difference {
private int lineNumber;
private String file1Line;
private String file2Line;
//constructor, getters, setters
}
Then keep a List<Difference>
and fill it like this:
String line1;
String line2;
int lineNumber = 0;
List<Difference> differences = new LinkedList<>();
do {
line1 = reader1.readLine();
line2 = reader2.readLine();
lineNumber++; //first line will have number 1
//we've hit the end of file1
if( line1 == null ) {
//if we've not hit the end of file2 yet, we have a difference
if( line2 != null ) {
differences.add(new Difference(lineNumber, line1, line2));
}
}
//if we didn't hit the end of file1 yet we just compare, this will return false if:
// - file2 contains a different line
// - we've hit the end of file2 in which case line2 is null
else if(!line1.equals(line2) {
differences.add(new Difference(lineNumber, line1, line2));
}
//once we've hit the end of either file we'll stop this loop
} while( line1 != null && line2 != null );
//read the remaining lines of both files
//since we hit the end of at least one line in the previous loop, only one of the following loops should, if at all
//if we already hit the end of file1 in the first loop, line1 should be null at this point and we won't enter this loop
while( line1 != null ) {
line1 = reader1.readLine();
lineNumber++;
differences.add(new Difference(lineNumber, line1, null));
}
//if we already hit the end of file2 in the first loop, line2 should be null at this point and we won't enter this loop
while( line2 != null ) {
line2 = reader2.readLine();
lineNumber++;
differences.add(new Difference(lineNumber, null, line2));
}
Now you should have a list of differences that you'd just need to write to a separate file. If that list is empty you didn't find any differences which means the files are equal (at least the lines are, the files might still have different line separators etc. but I guess that's no concern for you atm).
Note that you'd have to handle exceptions, closing the files and readers (see Matt's answer) etc. as well.
Please also note that you could do everything with less code (e.g. only one loop) but since you're a beginner it's often best to write some more code to make the process easier to understand.
Related Topics
How to Configure Maven for Offline Development
Converting Long to Date in Java Returns 1970
Is an Array a Primitive Type or an Object (Or Something Else Entirely)
Collection to Stream to a New Collection
How Can Non-Ascii Characters Be Removed from a String
How to Read PDF Files Using Java
How Do Hashcode() and Identityhashcode() Work at the Back End
How to Catch Out of Memory Exception in Java
How to Use "." as the Delimiter with String.Split() in Java
Java - How Would I Dynamically Add Swing Component to Gui on Click
Why Do I Get Java.Lang.Abstractmethoderror When Trying to Load a Blob in the Db
How to Generate Jaxb Classes from Xsd
Allowing Java to Use an Untrusted Certificate for Ssl/Https Connection
Finding Key Associated with Max Value in a Java Map