What Is the Time Complexity Performance of Hashset.Contains() in Java

What is the time complexity performance of HashSet.contains() in Java?

It runs in O(1) expected time, as any hash table (assuming the hash function is decent). It is backed by a HashMap where the key is the Object.

Two objects might have the same hash code, but the HashSet wouldn't think they are identical, unless the equals method for these objects says they are the same (i.e. returns true).

The contains method calls (indirectly) getEntry of HashMap, where key is the Object for which you wish to know if it's in the HashSet.

As you can see below, two objects can be stored in the HashMap/HashSet even if their key is mapped to the same value by the hash function. The method iterates over all keys that have the same hash value, and performs equals on each one to find the matching key.

final Entry<K,V> getEntry(Object key) {
         int hash = (key == null) ? 0 : hash(key.hashCode());
         for (Entry<K,V> e = table[indexFor(hash, table.length)];
              e != null;
              e = e.next) {
             Object k;
             if (e.hash == hash &&
                 ((k = e.key) == key || (key != null && key.equals(k))))
                 return e;
         }
         return null;
     }

Time complexity of HashSet.contains in Java

The contains in an HashSet is performed in O(1) (constant time):

This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets.

So the contains doesn't affect the complexity of the whole algorithm

Java HashSet worst case lookup time complexity

When looking up an element in a HashMap, it performs an O(1) calculation to find the right bucket, and then iterates over the items there serially until it finds the one the is equal to the requested key, or all the items are checked.

In the worst case scenario, all the items in the map have the same hash code and are therefore stored in the same bucket. In this case, you'll need to iterate over all of them serially, which would be an O(n) operation.

A HashSet is just a HashMap where you don't care about the values, only the keys - under the hood, it's a HashMap where all the values are a dummy Object.

HashSet vs ArrayList contains performance

The set will give much better performance (O(n) vs O(n^2) for the list), and that's normal because set membership (the contains operation) is the very purpose of a set.

Contains for a HashSet is O(1) compared to O(n) for a list, therefore you should never use a list if you often need to run contains.

What is the time complexity of HashMap.containsValue() in java?

To answer the question in the title - as mentioned by others, containsValue is O(n), because without the key it doesn't know where it is and the algorithm has to go over all the values stored in the map.

To answer the question in the body of your question - on how to solve it - just consider whether you really need a general map which can count how many instances have you seen of each number. I mean, the only time you'd care if there's more than one appearance of a number is when it's x/2, right? That smells like a corner case to me. Just add a test that checks for that corner case - something like embedding if (numberList[i] == requiredSum/2) half++ during your set-construction loop, and then if (requiredSum % 2 == 0 && half == 2) return true after it (see other variation below).

Then you can just iterate over the set and for each item check whether requiredSum-item also appears in the set.

To summarize (with early exits when possible):

Set<Integer> seen = new HashSet<Integer>();
boolean halfSeen = false;
for (int num : numberList) {
    if (num == requiredSum/2 && requiredSum % 2 == 0) {
        if (halfSeen) return true;
        halfSeen = true;
    } else {
        seen.add(num);
    }
}
for (int num : seen) {
    if (seen.contains(requiredSum - num)) return true;
}
return false;

Is calling contains() on an arraylist inefficient?

new HashSet<>(arrayList) takes O(n) time and O(n) space to build a HashSet.

hashSet.contains("something") takes O(1) time to find an element.

arrayList.contains("something") takes O(n) time to find an elements.

As result:

(new HashSet<>(arrayList)).contains("something") takes O(n) + O(1) = O(n) time + O(n) space complexity

arrayList.contains("something") takes O(n) time + 0 space complexity

It means that according to Big O notation, both expressions have O(n) time complexity, but arrayList.contains("something") does not take additional space, instead of newly created HashSet.

P.S.

I am not making any prediction on other cases, because I do not know other related aspects of your application. I am analyzing given piece of code only.

Does Java optimize new HashSet(someHashSet).contains() to be O(1)?

No, javac does not (and cannot) optimize this away. It's required to emit the byte code you describe in your source to this level. And the JVM will not be nearly intelligent enough to optimize this away. It's way too likely to have side effects to prove.

Don't return a copy of the HashSet if you want immutability. Wrap it in an unmodifiable wrapper: Collections.unmodifiableSet(myHashSet)

What is the time and space complexity of method retainAll when used on HashSets in Java?

Lets take a peruse at the code. The method retainAll is inherited from AbstractCollection and (at least in OpenJDK) looks like this:

public boolean retainAll(Collection<?> c) {
    boolean modified = false;
    Iterator<E> it = iterator();
    while (it.hasNext()) {
        if (!c.contains(it.next())) {
            it.remove();
            modified = true;
        }
    }
    return modified;
}

There is one big this to note here, we loop over this.iterator() and call c.contains. So the time complexity is n calls to c.contains where n = this.size() and at most n calls to it.remove().

This important thing is that the contains method is called on the other Collection and so the complexity is dependant upon the complexity of the other Collection contains.

So, whilst:

Set<String> common = new HashSet<>(Arrays.asList(a));
common.retainAll(new HashSet<>(Arrays.asList(b)));

Would be O(a.length), as HashSet.contains and HashSet.remove are both O(1) (amortized).

If you were to call

common.retainAll(Arrays.asList(b));

Then due to the O(n) contains on Arrays.ArrayList this would become O(a.length * b.length) - i.e. by spending O(n) copying the array to a HashSet you actually make the call to retainAll much faster.

As far as space complexity goes, no additional space (beyond the Iterator) is required by retainAll, but your invocation is actually quite expensive space-wise as you allocate two new HashSet implementations which are actually fully fledged HashMap.

Two further things can be noted:

There is no reason to allocate a HashSet from the elements in a - a cheaper collection that also has O(1) remove from the middle such as an LinkedList can be used. (cheaper in memory and also build time - a hash table is not built)
Your modifications are being lost as you create new collection instances and only return b.size().

How can a HashSet offer constant time add operation?

The number of buckets is dynamic, and is approximately ~2n, where n is the number of elements in the set.

Note that HashSet gives amortized and average time performance of O(1), not worst case. This means, we can suffer an O(n) operation from time to time.

So, when the bins are too packed up, we just create a new, bigger array, and copy the elements to it.

This costs n operations, and is done when number of elements in the set exceeds 2n/2=n, so it means, the average cost of this operation is bounded by n/n=1, which is a constant.

Additionally, the number of collisions a HashMap offers is also constant on average.

Assume you are adding an element x. The probability of h(x) to be filled up with one element is ~n/2n = 1/2. The probability of it being filled up with 3 elements, is ~(n/2n)^2 = 1/4 (for large values of n), and so on and so on.

This gives you an average running time of 1 + 1/2 + 1/4 + 1/8 + .... Since this sum converges to 2, it means this operation takes constant time on average.

What Is the Time Complexity Performance of Hashset.Contains() in Java