Big-O Summary for Java Collections Framework Implementations

Big-O summary for Java Collections Framework implementations?

This website is pretty good but not specific to Java: http://bigocheatsheet.com/
Here is an image in case this link won't work

Where can I find performance metrics (big-Oh notation) for different java containers?

Java Generics and Collections contains such data for all collection implementations.

Which Java Collection should I use?

Since I couldn't find a similar flowchart I decided to make one myself.

This flow chart does not try and cover things like synchronized access, thread safety etc or the legacy collections, but it does cover the 3 standard Sets, 3 standard Maps and 2 standard Lists.

Sample Image

This image was created for this answer and is licensed under a Creative Commons Attribution 4.0 International License. The simplest attribution is by linking to either this question or this answer.

Other resources

Probably the most useful other reference is the following page from the oracle documentation which describes each Collection.

HashSet vs TreeSet

There is a detailed discussion of when to use HashSet or TreeSet here:
Hashset vs Treeset

ArrayList vs LinkedList

Detailed discussion: When to use LinkedList over ArrayList?

big O notation for removing an element from a linked list

The time required to remove the item from the linked list depends on how exactly we a going to do this. Here are possibilities:

  • Remove by value. In this case we need to find an item (O(n)) and remove it (O(1)). The total time is O(n).
  • Remove by index. In this case we need to traverse list (O(index)) and remove item (O(1)). The total time is O(index). For arbitrary index it is O(n).
  • Remove by the reference to the linked list node that contains the item O(1).

In Java the last case is met when you are using Iterator.remove or ListIterator.remove.

Why isn't TreeSet's first method implemented in O(1) time?

iterator() returns an object that implements the Iterator interface. This can be accomplished in constant time, of course.

E. g. the iterator might manage a pointer to the current element. At the beginning this pointer might be null, so that the first call of hasNext() would have to walk down the tree to the first element, which should have a complexity of O(log n).

What is the time complexity of Collections#sort method in Java?

This depends on the version of Java you use. But in the end, the Big-O time complexity is still O(N*log(N)).

For Java 6, it's a modified version of mergesort. Check the description here: Collections#sort for Java 6

The sorting algorithm is a modified mergesort (in which the merge is omitted if the highest element in the low sublist is less than the lowest element in the high sublist). This algorithm offers guaranteed n log(n) performance. The specified list must be modifiable, but need not be resizable. This implementation dumps the specified list into an array, sorts the array, and iterates over the list resetting each element from the corresponding position in the array. This avoids the n2 log(n) performance that would result from attempting to sort a linked list in place.

For Java 7, it was improved: Collections#sort for Java 7 due to enhancement. Note that TimSort has a best case of O(N) and proves to be faster than the previous implementation.

Implementation note: This implementation is a stable, adaptive, iterative mergesort that requires far fewer than n lg(n) comparisons when the input array is partially sorted, while offering the performance of a traditional mergesort when the input array is randomly ordered. If the input array is nearly sorted, the implementation requires approximately n comparisons. Temporary storage requirements vary from a small constant for nearly sorted input arrays to n/2 object references for randomly ordered input arrays.

The implementation takes equal advantage of ascending and descending order in its input array, and can take advantage of ascending and descending order in different parts of the same input array. It is well-suited to merging two or more sorted arrays: simply concatenate the arrays and sort the resulting array.

The implementation was adapted from Tim Peters's list sort for Python ( TimSort). It uses techiques from Peter McIlroy's "Optimistic Sorting and Information Theoretic Complexity", in Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 467-474, January 1993.

This implementation dumps the specified list into an array, sorts the array, and iterates over the list resetting each element from the corresponding position in the array. This avoids the n2 log(n) performance that would result from attempting to sort a linked list in place.



Is this a good method for sorting an ArrayList of 10^6?

In theory, it is enough to use. But this makes me wonder why would you have to sort the data in memory. If the data comes from a database, then sort it there using an indexed column/field, otherwise check if you know some characteristics of the field you will use for sorting and if you may use a O(N) time complexity algorithm like Bucket Sort or Radix Sort. When there's no other way, use Collections#sort.



Related Topics



Leave a reply



Submit