Java: Hashset Vs TreeSet - when should I use over the other?
HashSet is Implemented using a hash table. Elements are not ordered. The add, remove,
and contains methods have constant time complexity O(1).
TreeSet is implemented using a tree structure(red-black tree in algorithm book). The elements in a set are sorted, but the add, remove, and contains methods has time complexity O(log (n)). It offers several methods to deal with the ordered set like first(), last(), headSet(), tailSet()
, etc.
1) First major difference between HashSet
and TreeSet
is performance. HashSet
is faster than TreeSet
and should be preferred choice if sorting of element is not required.
2) Second difference between HashSet
and TreeSet
is that HashSet
allows null object but TreeSet
doesn't allow null Object and throw NullPointerException
, Why, because TreeSet
uses compareTo()
method to compare keys and compareTo()
will throw java.lang.NullPointerException
.
3) Another significant difference between HashSet
and TreeSet
is that , HashSet
is backed by HashMap
while TreeSet
is backed by NavigableMap in Java.
4) One more difference between HashSet
and TreeSet
which is worth remembering is that HashSet uses equals()
method to compare two object in Set and for detecting duplicates while TreeSet
uses compareTo()
method for same purpose. if equals()
and compareTo()
are not consistent, i.e. for two equal object equals should return true while compareTo()
should return zero, than it will break contract of Set interface and will allow duplicates in Set implementations like TreeSet
5) Now most important difference between HashSet
and TreeSet
is ordering. HashSet
doesn't guaranteed any order while TreeSet
maintains objects in Sorted order defined by either Comparable
or Comparator
method in Java.
6) TreeSet
does not allow to insert Heterogeneous
objects. It will throw classCastException
at Runtime
if trying to add hetrogeneous objects, whereas HashSet
allows hetrogeneous objects.
Hashset vs Treeset
HashSet is much faster than TreeSet (constant-time versus log-time for most operations like add, remove and contains) but offers no ordering guarantees like TreeSet.
HashSet
- the class offers constant time performance for the basic operations (add, remove, contains and size).
- it does not guarantee that the order of elements will remain constant over time
- iteration performance depends on the initial capacity and the load factor of the HashSet.
- It's quite safe to accept default load factor but you may want to specify an initial capacity that's about twice the size to which you expect the set to grow.
TreeSet
- guarantees log(n) time cost for the basic operations (add, remove and contains)
- guarantees that elements of set will be sorted (ascending, natural, or the one specified by you via its constructor) (implements
SortedSet
) - doesn't offer any tuning parameters for iteration performance
- offers a few handy methods to deal with the ordered set like
first()
,last()
,headSet()
, andtailSet()
etc
Important points:
- Both guarantee duplicate-free collection of elements
- It is generally faster to add elements to the HashSet and then convert the collection to a TreeSet for a duplicate-free sorted traversal.
- None of these implementations are synchronized. That is if multiple threads access a set concurrently, and at least one of the threads modifies the set, it must be synchronized externally.
- LinkedHashSet is in some sense intermediate between
HashSet
andTreeSet
. Implemented as a hash table with a linked list running through it, however,it provides insertion-ordered iteration which is not same as sorted traversal guaranteed by TreeSet.
So a choice of usage depends entirely on your needs but I feel that even if you need an ordered collection then you should still prefer HashSet to create the Set and then convert it into TreeSet.
- e.g.
SortedSet<String> s = new TreeSet<String>(hashSet);
HashSet vs TreeSet vs LinkedHashSet on basis of adding duplicate value
TreeSet, LinkedHashSet and HashSet in Java are three Set implementation in collection framework and like many others they are also used to store objects. Main feature of TreeSet is sorting, LinkedHashSet is insertion order and HashSet is just general purpose collection for storing object. HashSet is implemented using HashMap in Java while TreeSet is implemented using TreeMap. TreeSet is a SortedSet implementation which allows it to keep elements in the sorted order defined by either Comparable or Comparator interface. Comparable is used for natural order sorting and Comparator for custom order sorting of objects, which can be provided while creating instance of TreeSet. Anyway before seeing difference between TreeSet, LinkedHashSet and HashSet, let's see some similarities between them:
1) Duplicates : All three implements Set interface means they are not allowed to store duplicates.
2) Thread safety : HashSet, TreeSet and LinkedHashSet are not thread-safe, if you use them in multi-threading environment where at least one Thread modifies Set you need to externally synchronize them.
3) Fail-Fast Iterator : Iterator returned by TreeSet, LinkedHashSet and HashSet are fail-fast Iterator. i.e. If Iterator is modified after its creation by any way other than Iterators remove() method, it will throw ConcurrentModificationException with best of effort. read more about fail-fast vs fail-safe Iterator here
Now let’s see difference between HashSet, LinkedHashSet and TreeSet in Java :
Performance and Speed : First difference between them comes in terms of speed. HashSet is fastest, LinkedHashSet is second on performance or almost similar to HashSet but TreeSet is bit slower because of sorting operation it needs to perform on each insertion. TreeSet provides guaranteed O(log(n)) time for common operations like add, remove and contains, while HashSet and LinkedHashSet offer constant time performance e.g. O(1) for add, contains and remove given hash function uniformly distribute elements in bucket.
Ordering : HashSet does not maintain any order while LinkedHashSet maintains insertion order of elements much like List interface and TreeSet maintains sorting order or elements.
Internal Implementation : HashSet is backed by an HashMap instance, LinkedHashSet is implemented using HashSet and LinkedList while TreeSet is backed up by NavigableMap in Java and by default it uses TreeMap.
null : Both HashSet and LinkedHashSet allows null but TreeSet doesn't allow null and throw java.lang.NullPointerException when you will insert null into TreeSet. Since TreeSet uses compareTo() method of respective elements to compare them which throws NullPointerException while comparing with null, here is an example:
TreeSet cities
Exception in thread "main" java.lang.NullPointerException
at java.lang.String.compareTo(String.java:1167)
at java.lang.String.compareTo(String.java:92)
at java.util.TreeMap.put(TreeMap.java:545)
at java.util.TreeSet.add(TreeSet.java:238)
Comparison : HashSet and LinkedHashSet uses equals() method in Java for comparison but TreeSet uses compareTo() method for maintaining ordering. That's why compareTo() should be consistent to equals in Java. failing to do so break general contact of Set interface i.e. it can permit duplicates.
Use can use below link to see internal implementation
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashSet.java#HashSet.add%28java.lang.Object%29
From the source code
Hashset hases Hashmap to store the data and LinkedHashSet extends Hashset and hence uses same add method of Hashset But TreeSet uses NavigableMap to store the data
Source: http://javarevisited.blogspot.com/2012/11/difference-between-treeset-hashset-vs-linkedhashset-java.html#ixzz2lGo6Y9mm
TreeSet and HashSet
HashSet
is unordered, which specifically means that Java makes no guarantee about the order in which keys (and therefore values) are stored. On the other hand, TreeSet
stores its elements in a natural order.
From the Javadoc for TreeSet
:
The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
and If the objects can not
be sorted in natural order than use compareTo() method to sort the elements of TreeSet object but hashmap is using equals() method to compare the elements.
Is it possible that TreeSet equals HashSet but not HashSet equals TreeSet
Your interviewer is right, they do not hold equivalence relation for some specific cases. It is possible that TreeSet
can be equal to HashSet
and not vice-versa. Here is an example:
TreeSet<String> treeSet = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
HashSet<String> hashSet = new HashSet<>();
treeSet.addAll(List.of("A", "b"));
hashSet.addAll(List.of("A", "B"));
System.out.println(hashSet.equals(treeSet)); // false
System.out.println(treeSet.equals(hashSet)); // true
The reason for this is that a TreeSet
uses comparator to determine if an element is duplicate while HashSet
uses equals
.
Quoting TreeSet
:
Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface.
TreeSet vs HashSet speed for small set size, when O(1) vs O(log n) doesn't matter
and hence I'm puzzled on how to best measure/simulate the difference.
Use a profiler. If the Set operations don't dominate the results (CPU time, memory footprint, allocation rates) then your choice will not make a difference in practice due to amdahl's law..
The biggest advantage of the TreeSet is ordering.
And neither implementation is particularly memory-efficient, there are better Sets out there, depending on which performance measure you care most about. They are wrappers around the corresponding Map implementations and the Maps themselves are not particularly efficient either.
They're more designed for flexibility, providing the vast set of APIs they do than optimizing any particular performance aspect.
Should I use a `HashSet` or a `TreeSet` for a very large dataset?
When we tried to store 50 million records in HashMap with proper initialization parameters, insertion started to slowdown, especially after 35 million records. Changing to TreeMap gave a constant insertion and retrieval performance.
Observation : TreeMap will give better performance than a HashMap for large input set. For a smaller set, of course HashMap will give better performance.
Related Topics
Connecting to Remote Url Which Requires Authentication Using Java
Java Error: Implicit Super Constructor Is Undefined for Default Constructor
Illegalmonitorstateexception on Wait() Call
Why Does Inetaddress.Isreachable Return False, When I Can Ping the Ip Address
Why Are Filenames in Java the Same as the Public Class Name
How to Map an Entity Field Whose Name Is a Reserved Word in JPA
How to Switch Between Frames in Selenium Webdriver Using Java
Tomcat 10.0.4 Doesn't Load Servlets (@Webservlet Classes) with 404 Error
Add Image Thumbnails to a Layout in a Grid
What Exactly Is Field Injection and How to Avoid It
String, Stringbuffer, and Stringbuilder
What Is Suppresswarnings ("Unchecked") in Java
Why Is There No Multiple Inheritance in Java, But Implementing Multiple Interfaces Is Allowed
How to Connect to Oracle Using Service Name Instead of Sid
How to Resolve the "Java.Net.Bindexception: Address Already in Use: Jvm_Bind" Error