What Is the Fastest Way to Compare Two Sets in Java

Java - Best way to compare two Sets by hashes (via ==)

Since the two Sets are not the same instance, you can't use == to compare the Sets.

Now, if you use Set's equals, it (and by "it" I mean the default implementation of AbstractSet) will validate the two Sets have the same size and then iterate over the elements of one Set and check if the other Set contains each of them.

If, for example, you are using HashSets, in order to find if a String is contained in some HashSet, hashCode() will have to be used to find the bucket that may contain the searched String, but later String's equals() will be called to validate the presence of a String equal to the searched String. String's equals() actually begins with

    if (this == anObject) {
return true;
}

so the fact that your two Sets may contain references to the same String instance will be helpful in improving the runtime of the Sets comparison.

Compare two sets of different types

This is what I would do in your case. You have sets. Sets are hard to compare, but on top of that, you want to compare on their id.

I see only one proper solution where you have to normalize the wanted values (extract their id) then sort those ids, then compare them in order, because if you don't sort and compare you can possibly skip pass over duplicates and/or values.

Think about the fact that Java 8 allows you to play lazy with streams. So don't rush over and think that extracting, then sorting then copying is long. Lazyness allows it to be rather fast compared to iterative solutions.

HashSet<ValueObject> valueObjects = new HashSet<>();
valueObjects.add(ValueObject.of(1));
valueObjects.add(ValueObject.of(2));
valueObjects.add(ValueObject.of(3));

HashSet<DTO> dtos = new HashSet<>();
dtos.add(DTO.of(1));
dtos.add(DTO.of(2));
dtos.add(DTO.of(34));

boolean areIdentical = Arrays.equals(
valueObjects.stream()
.mapToInt((v) -> v.id)
.sorted()
.toArray(),
dtos.stream()
.mapToInt((d) -> d.id)
.sorted()
.toArray()
);

You want to generalize the solution? No problem.

public static <T extends Comparable<?>> boolean areIdentical(Collection<ValueObject> vos, Function<ValueObject, T> voKeyExtractor, Collection<DTO> dtos, Function<DTO, T> dtoKeyExtractor) {
return Arrays.equals(
vos.stream()
.map(voKeyExtractor)
.sorted()
.toArray(),
dtos.stream()
.map(dtoKeyExtractor)
.sorted()
.toArray()
);
}

And for a T that is not comparable:

public static <T> boolean areIdentical(Collection<ValueObject> vos, Function<ValueObject, T> voKeyExtractor, Collection<DTO> dtos, Function<DTO, T> dtoKeyExtractor, Comparator<T> comparator) {
return Arrays.equals(
vos.stream()
.map(voKeyExtractor)
.sorted(comparator)
.toArray(),
dtos.stream()
.map(dtoKeyExtractor)
.sorted(comparator)
.toArray()
);
}

You mention Guava and if you don't have Java 8, you can do the following, using the same algorithm:

List<Integer> voIds = FluentIterables.from(valueObjects)
.transform(valueObjectIdGetter())
.toSortedList(intComparator());
List<Integer> dtoIds = FluentIterables.from(dtos)
.transform(dtoIdGetter())
.toSortedList(intComparator());
return voIds.equals(dtoIds);

Getting the difference between two sets

Try this

test2.removeAll(test1);

Set#removeAll

Removes from this set all of its elements that are contained in the specified collection (optional operation). If the specified collection is also a set, this operation effectively modifies this set so that its value is the asymmetric set difference of the two sets.

How can I (more) easily compare two sets of numbers?

EDIT: I just realized you said these sets are hardcoded at 3 values. This is a super general algorithm for sets of any size.

For a 3-value set, you can do the same dumping and sorting of set elements, and then do:

if(setB.a < setA.a)
if(setB.b < setA.b)
if(setB.c < setA.c)
return true;
return false;

======================================================

A general algorithm:

This is the most efficient way I can immediately think of to do this.

pseudocode (more pythonic than java, sorry-- hopefully the comments explain):

list l1 = set1.items() //get the items out
list l2 = set2.items()

l1 = sort(l1)
l2 = sort(l2) //sort the lists

int set2idx1 = l1[0].find_closest_greater_than_value(l2) //binary search or something
if set2idx1 exists:
l2 = l2[set2idx1+1:] //in python this means l2 is reassigned to a subarray of l2 starting at set2idx1+1 going to the end of l2
else:
return false

for(int i=1; i<l1.len; i++)
int set2idxi = l1[i].find_closest_greater_than_value(l2) //binary search or something
if set2idxi exists:
l2 = l2[set2idxi+1:]
else
return false

return true

comment if anything doesn't make sense

EDIT EDIT:

explanation of the general algorithm for any interested parties:

  1. dump the set elements into arrays
  2. sort those arrays
  3. iterate through the first array, seeing if there is a value in the 2nd array that is greater than the current value. If so, get the index of that value and lop off everything before and including that index and reassign the 2nd array variable to what remains.
  4. if there ever is no such value (either because it doesn't exist or you've run out of values to test, return false). Otherwise, at the end, return true.

The idea here is that since the arrays are sorted, you know that any element that is greater than the matched element in the 2nd array will be greater than the element you're testing against in the 1st array. So you can just chop off the lower values, and since you don't want to use the same value, you can chop off the one you found, too. If you ever return false you know it's because either there were no greater values, either because the numbers in array1 were all greater than the numbers in array2, or because there weren't enough numbers in array2 greater than the numbers in array1.

How to compare two Sets of objects Not overriding equals/hashCode with Comparator and Streams

A Fix for Comparator

In order to create a Comparator in this case you need to provide generic type information explicitly, like that: <Book, String>comparing(), where Book and String are the parameter type and return type of the Function that Comparator.comparing() expects as a parameter.

Without an explicit declaration, the compiler has not enough data to determine the type of the variable book, and inside both comparing() and thenComparing() its type will be inferred as Object.

Comparator<Book> comparingBooks = 
Comparator.<Book, String>comparing(book -> book.getName())
.thenComparing(book -> book.getType());

If the case if only one of the static methods was used, the type of the book variable will be correctly inferred by the compiler as Book based on the type of the locale variable Comparator<Book> comparingBooks.

Both method lambda expressions could be replaced with method references:

Comparator<Book> comparingBooks =
Comparator.<Book, String>comparing(Book::getName)
.thenComparing(Book::getType);

for information on the syntax of generic methods, take a look at this tutorial

for more information on how to build comparators with Java 8 methods take a look at this tutorial

Stream

should validate any difference of Name and Type by returning true, if both Sets have exactly the same objects, it should return false.

Stream s1.stream().anyMatch(a -> s2.stream().anyMatch(b -> ...)) is currently checking whether there's an element in the first set s1 that is different from ANY of the elements in s2. I.e. the nested anyMatch() will return true for the first element encountered in s1 that has not matching element in s2. After the comparison of Book("book 1", BookType.ONE) (s1) with Book("book 2", BookType.TWO) (s2) anyMatch() terminates by returning true. And enclosing anyMatch() operation propagates this result.

Precisely the same will happen with the second example. The same pair of elements differ, although you've changed a type, names are not equal. Result is true.

Note:

  • that by convention classes are usually named with singular nouns, i.e. Book (not Books), Event, Person, etc.
  • as a rule of thumb, if your objects are intended to be used with collection they must implement equals/hashCode contract, your requirement is very unusual.


Related Topics



Leave a reply



Submit