Is a Java Hashmap Search Really O(1)

Is a Java hashmap search really O(1)?

A particular feature of a HashMap is that unlike, say, balanced trees, its behavior is probabilistic. In these cases its usually most helpful to talk about complexity in terms of the probability of a worst-case event occurring would be. For a hash map, that of course is the case of a collision with respect to how full the map happens to be. A collision is pretty easy to estimate.

pcollision = n / capacity

So a hash map with even a modest number of elements is pretty likely to experience at least one collision. Big O notation allows us to do something more compelling. Observe that for any arbitrary, fixed constant k.

O(n) = O(k * n)

We can use this feature to improve the performance of the hash map. We could instead think about the probability of at most 2 collisions.

pcollision x 2 = (n / capacity)2

This is much lower. Since the cost of handling one extra collision is irrelevant to Big O performance, we've found a way to improve performance without actually changing the algorithm! We can generalzie this to

pcollision x k = (n / capacity)k

And now we can disregard some arbitrary number of collisions and end up with vanishingly tiny likelihood of more collisions than we are accounting for. You could get the probability to an arbitrarily tiny level by choosing the correct k, all without altering the actual implementation of the algorithm.

We talk about this by saying that the hash-map has O(1) access with high probability

How are HashMap et al really O(1)?

Let's take a look at a very simple example of a hash map and a hash function. To keep things simple, let's say that your hash map has 10 buckets and that it uses integers as keys. For the purposes of this example we shall use the following hash function:

public int hash(int key) {
return key % 10;
}

Now, when we want to store an entry in the map, we hash the key, get an integer between 0-9 and then put that entry in the corresponding bucket. Then, when we need to lookup a key, we just have to compute it's hash and we know exactly what bucket it is in (or would be in) without having to look in any of the others.

Why hashmap lookup is O(1) i.e. constant time?

Under the appropriate assumptions on the hash function being used, we can say that hash table lookups take expected O(1) time (assuming you're using a standard hashing scheme like linear probing or chained hashing). This means that on average, the amount of work that a hash table does to perform a lookup is at most some constant.

Intuitively, if you have a "good" hash function, you would expect that elements would be distributed more or less evenly throughout the hash table, meaning that the number of elements in each bucket would be close to the number of elements divided by the number of buckets. If the hash table implementation keeps this number low (say, by adding more buckets every time the ratio of elements to buckets exceeds some constant), then the expected amount of work that gets done ends up being some baseline amount of work to choose which bucket should be scanned, then doing "not too much" work looking at the elements there, because on expectation there will only be a constant number of elements in that bucket.

This doesn't mean that hash tables have guaranteed O(1) behavior. In fact, in the worst case, the hashing scheme will degenerate and all elements will end up in one bucket, making lookups take time Θ(n) in the worst case. This is why it's important to design good hash functions.

For more information, you might want to read an algorithms textbook to see the formal derivation of why hash tables support lookups so efficiently. This is usually included as part of a typical university course on algorithms and data structures, and there are many good resources online.

Fun fact: there are certain types of hash tables (cuckoo hash tables, dynamic perfect hash tables) where the worst case lookup time for an element is O(1). These hash tables work by guaranteeing that each element can only be in one of a few fixed positions, with insertions sometimes scrambling around elements to try to make everything fit.

Hope this helps!

How does hashing have an o(1) search time?

Well it's a little bit of a lie -- it can take longer than that, but it usually doesn't.

Basically, a hash table is an array containing all of the keys to search on. The position of each key in the array is determined by the hash function, which can be any function which always maps the same input to the same output. We shall assume that the hash function is O(1).

So when we insert something into the hash table, we use the hash function (let's call it h) to find the location where to put it, and put it there. Now we insert another thing, hashing and storing. And another. Each time we insert data, it takes O(1) time to insert it (since the hash function is O(1).

Looking up data is the same. If we want to find a value, x, we have only to find out h(x), which tells us where x is located in the hash table. So we can look up any hash value in O(1) as well.

Now to the lie: The above is a very nice theory with one problem: what if we insert data and there is already something in that position of the array? There is nothing which guarantees that the hash function won't produce the same output for two different inputs (unless you have a perfect hash function, but those are tricky to produce). Therefore, when we insert we need to take one of two strategies:

  • Store multiple values at each spot in the array (say, each array slot has a linked list). Now when you do a lookup, it is still O(1) to arrive at the correct place in the array, but potentially a linear search down a (hopefully short) linked list. This is called "separate chaining".
  • If you find something is already there, hash again and find another location. Repeat until you find an empty spot, and put it there. The lookup procedure can follow the same rules to find the data. Now it's still O(1) to get to the first location, but there is a potentially (hopefully short) linear search to bounce around the table till you find the data you are after. This is called "open addressing".

Basically, both approaches are still mostly O(1) but with a hopefully-short linear sequence. We can assume for most purposes that it is O(1). If the hash table is getting too full, those linear searches can become longer and longer, and then it is time to "re-hash" which means to create a new hash table of a much bigger size and insert all the data back into it.

How to calculate the complexity of the HashMap search algorithm?

HashMap works on the hashing principle.It is the data structure that allow you to store and retrieve data in O(1) time provided we know the key.

In hashing, hash functions are used to link key and value in HashMap. Objects are stored by calling put(key, value) method of HashMap and retrieved by calling get(key) method. When we call put method, hashcode() method of the key object is called so that hash function of the map can find a bucket location to store value object, which is actually an index of the internal array, known as the table. HashMap internally stores mapping in the form of Map.Entry object which contains both key and value object. When you want to retrieve the object, you call the get() method and again pass the key object. This time again key object generate same hash code (it's mandatory for it to do so to retrieve the object and that's why HashMap keys are immutable e.g. String) and we end up at same bucket location. If there is only one object then it is returned and that's your value object which you have stored earlier. Things get little tricky when collisions occur.

Collision : Since the internal array of HashMap is of fixed size, and if you keep storing objects, at some point of time hash function will return same bucket location for two different keys, this is called collision in HashMap. In this case, a linked list is formed at that bucket location and a new entry is stored as next node.

If we try to retrieve an object from this linked list, we need an extra check to search correct value, this is done by equals() method. Since each node contains an entry, HashMap keeps comparing entry's key object with the passed key using equals() and when it return true, Map returns the corresponding value.

Since searching inlined list is O(n) operation, in worst case hash collision reduce a map to linked list. This issue is recently addressed in Java 8 by replacing linked list to the tree to search in O(logN) time.

By the way, you can easily verify how HashMap works by looking at the code of HashMap.java in your Eclipse IDE if you are keenly interested in the code, otherwise the logic is explained above.

Information On Buckets : An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

Why is a hash table lookup only O(1) time when searching for a key is O(n)?

I think you are confusing terminology and also complicating matters by thinking about buckets.

Let's imagine a hash table that is implemented in as an array a of length n. Let's also imagine we have n possible keys and a perfect hash function H that maps each key k to a unique index i in a.

Let's initialize our hash table by setting every value in a to nil.

We can insert a key, value pair (k1, v1) into our hash table by placing the value at the appropriate position in the array:

a[H(k1)] = v1

Now let's say that later we forgot if k1 is in the hash table and we want to check if it's there. To do this we simply look up a[H(k1)] and see if any value is there, i.e. a[H(k1)] != nil. This is clearly a constant time lookup.

But what if we want to see if v1, or even some other v2 is anywhere in our hashtable? This is not so easy because we have no function that maps a vi to a position in our array. It could be associated with any key. So the only way to see if it exists in the table is to scan the entire array, checking every value:

for i in 0..n-1:
if a[i] == v2:
return true
return false

To make this a bit more concrete, imagine your keys are names, and your values are cities of residence. Now compare asking "Is Bob Jones in the hash table?" to "Is there anyone from New York in the hash table?". We can hash "Bob Jones" and see if there's anything in the corresponding array position (because that's how "Bob Jones" would have been inserted), but we have no similarly quick way to look up "New York".

I am assuming this is what you are asking, and you have confused the terminology a bit. Please comment if this is not what you wanted.

How is it possible for Java HashMap to perform constant time lookup O(1) for get operations?

Here is my two cent, friend. Think of a HashMap the way you think of an array. In fact, it is an array. If I give you index 11, you don't have to iterate through the array to find the object at index 11. You simply go there directly. That's how a HashMap works: the trick is to make the index the same as the value -- essentially.

So that is where a hashcode comes in. Let's look at the trivial case where your hash function is a unit multiplier (i.e. 1). Then if you have the values 0 to 99 and you want to store them in an array, then they will be stored at indices 0 to 99 respectively. So that a put and a get is clearly O(1) since getting and putting things in an array is O(1). Now let's imagine a less trivial hash function, say, y = x+2. So in this case the values 0 to 99 will be stored at indices 2 to 101. Here given a value, say 11, you must compute the hash to find it or put it (the hash being 11+2 =13). So okay, the hash function is doing some work to calculate the correct index given the value (in our case y = x+2= 11+2=13). But the amount of effort that goes into that work has nothing to do with how many data points you have. If I need to put 999 or 123, the amount of work for a single put or get is still the same: y= x+2: I only have to add two each time I do a put or a get: that's constant work.

Your confusion may be that you want to put n values at once. Well in that case each put is still constant c. But c multiplied by n is c*n=O(n). So the n has nothing to do with the put itself, but rather that you are making n puts all at once. I hope this explanation helps.

What is the time complexity of HashMap insertion and retrieval inside of a for loop?

It's O(n) for traversing the for loop for the elements and O(1) for the HashMap, the final answer is O(n), see more in here.

Why can we say the complexity of hashmap is O(1)

First off, a hashmap doesn't have complexity. Inserting into a hashmap does. Reading from a hashmap does. Operations have time complexity, object do not. Objects can have memory complexity, but that's not what we're talking about here.

Secondly, a hashmap doesn't always have O(1) even for reads. It has O(1) average time. The actual time can be up to O(n) for a single read, depending on how you resolve conflicts. For example, if you use a linked list conflict resolution, writes are always O(1) but reads can be up to O(n) if your hash function is bad. If you use a resize resolution, reads are always O(1), but writes can be O(n). Other solutions get other balances.

Third, this isn't a hash map. This is a hash function. It turns a complex value into a numerical one for comparison (more formally, it maps objects from a space of size N to a space of size M where N>M). That doesn't have any promise of being O(1), it's a completely separate concept from a hash map. A hash map uses a hash function to insert objects into a very large array, and thus gets O(1) time for reads and writes if the hash function is good enough that collisions are rare. The hash function itself can be any complexity, depending on the data and how it works. String hashes generally are O(n) on the string, because you want to try to make it unique (if you stop after say 4 characters, all strings with those first 4 would collide).



Related Topics



Leave a reply



Submit