When Should I Use the Hashset<T> Type

When should I use the HashSetT type?

The important thing about HashSet<T> is right there in the name: it's a set. The only things you can do with a single set is to establish what its members are, and to check whether an item is a member.

Asking if you can retrieve a single element (e.g. set[45]) is misunderstanding the concept of the set. There's no such thing as the 45th element of a set. Items in a set have no ordering. The sets {1, 2, 3} and {2, 3, 1} are identical in every respect because they have the same membership, and membership is all that matters.

It's somewhat dangerous to iterate over a HashSet<T> because doing so imposes an order on the items in the set. That order is not really a property of the set. You should not rely on it. If ordering of the items in a collection is important to you, that collection isn't a set.

Sets are really limited and with unique members. On the other hand, they're really fast.

When do we use HashSet

i want to know when we use HashSet<> , Dictionary<> or <List>

They all have different purpose and used in different scenarios

HashSet

Is used when you want to have a collection with unique elements. HashSet stores list of unique elements and won't allow duplicates in it.

Dictionary

Is used when you want to have a value against a unique key. Each element in Dictionary has two parts a (unique) key and a value. You can store a unique key in it (just like Hashset) in addition you can store a value against that unique key.

List

Is just a simple collection of elements. You can have duplicates in it.

What's the use of specifying the type of a HashSet when working with the generic Set interface?

If you say this:

Set<T> set = new HashSet();

you'll get an unchecked conversion warning. The static type of new HashSet() is a raw HashSet, and converting from that to a generified type is potentially unsafe.

There are other circumstances where doing an assignment will cause type inference to take place. Calling a static method is the canonical example. eg:

Set<T> set = Collections.emptySet();

Java doesn't do inference on the new operator, however, as this would introduce an ambiguity into the language.

If you don't like the redundancy you can use a wrapper static method, and inference will take place. Google's Guava does this, so you can say:

Set<T> set = Sets.newhashSet();

What is the difference between HashSetT and ListT?

Unlike a List<> ...

  1. A HashSet is a List with no duplicate members.

  2. Because a HashSet is constrained to contain only unique entries, the internal structure is optimised for searching (compared with a list) - it is considerably faster

  3. Adding to a HashSet returns a boolean - false if addition fails due to already existing in Set

  4. Can perform mathematical set operations against a Set: Union/Intersection/IsSubsetOf etc.

  5. HashSet doesn't implement IList only ICollection

  6. You cannot use indices with a HashSet, only enumerators.

The main reason to use a HashSet would be if you are interested in performing Set operations.

Given 2 sets: hashSet1 and hashSet2

 //returns a list of distinct items in both sets
HashSet set3 = set1.Union( set2 );

flies in comparison with an equivalent operation using LINQ. It's also neater to write!

HashSetT fundamental things

I'm not sure where you quoted that from, it doesn't appear in the documentation at all. To actually quote from the documentation:

The HashSet<T> class is based on the model of mathematical sets and provides high-performance set operations similar to accessing the keys of the Dictionary<TKey, TValue> or Hashtable collections. In simple terms, the HashSet<T> class can be thought of as a Dictionary<TKey, TValue> collection without values.

So it is a collection, containing actual values. The sentence you quoted there is false.

That being said, you could create an ISet<T> implementation that works that way, e.g. representing numbers in a set as ranges. But trying to do that with a vanilla HashSet<T> will quickly break down.

HashSet vs. List performance

A lot of people are saying that once you get to the size where speed is actually a concern that HashSet<T> will always beat List<T>, but that depends on what you are doing.

Let's say you have a List<T> that will only ever have on average 5 items in it. Over a large number of cycles, if a single item is added or removed each cycle, you may well be better off using a List<T>.

I did a test for this on my machine, and, well, it has to be very very small to get an advantage from List<T>. For a list of short strings, the advantage went away after size 5, for objects after size 20.

1 item LIST strs time: 617ms
1 item HASHSET strs time: 1332ms

2 item LIST strs time: 781ms
2 item HASHSET strs time: 1354ms

3 item LIST strs time: 950ms
3 item HASHSET strs time: 1405ms

4 item LIST strs time: 1126ms
4 item HASHSET strs time: 1441ms

5 item LIST strs time: 1370ms
5 item HASHSET strs time: 1452ms

6 item LIST strs time: 1481ms
6 item HASHSET strs time: 1418ms

7 item LIST strs time: 1581ms
7 item HASHSET strs time: 1464ms

8 item LIST strs time: 1726ms
8 item HASHSET strs time: 1398ms

9 item LIST strs time: 1901ms
9 item HASHSET strs time: 1433ms

1 item LIST objs time: 614ms
1 item HASHSET objs time: 1993ms

4 item LIST objs time: 837ms
4 item HASHSET objs time: 1914ms

7 item LIST objs time: 1070ms
7 item HASHSET objs time: 1900ms

10 item LIST objs time: 1267ms
10 item HASHSET objs time: 1904ms

13 item LIST objs time: 1494ms
13 item HASHSET objs time: 1893ms

16 item LIST objs time: 1695ms
16 item HASHSET objs time: 1879ms

19 item LIST objs time: 1902ms
19 item HASHSET objs time: 1950ms

22 item LIST objs time: 2136ms
22 item HASHSET objs time: 1893ms

25 item LIST objs time: 2357ms
25 item HASHSET objs time: 1826ms

28 item LIST objs time: 2555ms
28 item HASHSET objs time: 1865ms

31 item LIST objs time: 2755ms
31 item HASHSET objs time: 1963ms

34 item LIST objs time: 3025ms
34 item HASHSET objs time: 1874ms

37 item LIST objs time: 3195ms
37 item HASHSET objs time: 1958ms

40 item LIST objs time: 3401ms
40 item HASHSET objs time: 1855ms

43 item LIST objs time: 3618ms
43 item HASHSET objs time: 1869ms

46 item LIST objs time: 3883ms
46 item HASHSET objs time: 2046ms

49 item LIST objs time: 4218ms
49 item HASHSET objs time: 1873ms

Here is that data displayed as a graph:

Sample Image

Here's the code:

static void Main(string[] args)
{
int times = 10000000;

for (int listSize = 1; listSize < 10; listSize++)
{
List<string> list = new List<string>();
HashSet<string> hashset = new HashSet<string>();

for (int i = 0; i < listSize; i++)
{
list.Add("string" + i.ToString());
hashset.Add("string" + i.ToString());
}

Stopwatch timer = new Stopwatch();
timer.Start();
for (int i = 0; i < times; i++)
{
list.Remove("string0");
list.Add("string0");
}
timer.Stop();
Console.WriteLine(listSize.ToString() + " item LIST strs time: " + timer.ElapsedMilliseconds.ToString() + "ms");

timer = new Stopwatch();
timer.Start();
for (int i = 0; i < times; i++)
{
hashset.Remove("string0");
hashset.Add("string0");
}
timer.Stop();
Console.WriteLine(listSize.ToString() + " item HASHSET strs time: " + timer.ElapsedMilliseconds.ToString() + "ms");
Console.WriteLine();
}

for (int listSize = 1; listSize < 50; listSize+=3)
{
List<object> list = new List<object>();
HashSet<object> hashset = new HashSet<object>();

for (int i = 0; i < listSize; i++)
{
list.Add(new object());
hashset.Add(new object());
}

object objToAddRem = list[0];

Stopwatch timer = new Stopwatch();
timer.Start();
for (int i = 0; i < times; i++)
{
list.Remove(objToAddRem);
list.Add(objToAddRem);
}
timer.Stop();
Console.WriteLine(listSize.ToString() + " item LIST objs time: " + timer.ElapsedMilliseconds.ToString() + "ms");

timer = new Stopwatch();
timer.Start();
for (int i = 0; i < times; i++)
{
hashset.Remove(objToAddRem);
hashset.Add(objToAddRem);
}
timer.Stop();
Console.WriteLine(listSize.ToString() + " item HASHSET objs time: " + timer.ElapsedMilliseconds.ToString() + "ms");
Console.WriteLine();
}

Console.ReadLine();
}

ListT vs HashSetT - dynamic collection choice is efficient or not?

Hashtables like Hashset<T> and Dictionary<K,T> are faster at searching and inserting items in any order.

Arrays T[] are best used if you always have a fixed size and a lot of indexing operations. Adding items to a array is slower than adding into a list due to the covariance of arrays in c#.

List<T> are best used for dynamic sized collections whith indexing operations.

I don't think it is a good idea to write something like the hybrid collection better use a collection dependent on your requirements. If you have a buffer with a lof of index based operations i would not suggest a Hashtable, as somebody already quoted a Hashtable by design uses more memory

HashSetT versus DictionaryK, V w.r.t searching time to find if an item exists

HashSet vs List vs Dictionary performance test, taken from here.

Add 1000000 objects (without checking duplicates)

Sample Image

Contains check for half the objects of a collection of 10000

Sample Image

Remove half the objects of a collection of 10000

Sample Image

Define: What is a HashSet?


    1. A HashSet holds a set of objects, but in a way that allows you to easily and quickly determine whether an object is already in the set or not. It does so by internally managing an array and storing the object using an index which is calculated from the hashcode of the object. Take a look here
  1. HashSet is an unordered collection containing unique elements. It has the standard collection operations Add, Remove, Contains, but since it uses a hash-based implementation, these operations are O(1). (As opposed to List for example, which is O(n) for Contains and Remove.) HashSet also provides standard set operations such as union, intersection, and symmetric difference. Take a look here

  2. There are different implementations of Sets. Some make insertion and lookup operations super fast by hashing elements. However, that means that the order in which the elements were added is lost. Other implementations preserve the added order at the cost of slower running times.

The HashSet class in C# goes for the first approach, thus not preserving the order of elements. It is much faster than a regular List. Some basic benchmarks showed that HashSet is decently faster when dealing with primary types (int, double, bool, etc.). It is a lot faster when working with class objects. So the point is that HashSet is fast.

The only catch of HashSet is that there is no access by indices. To access elements you can either use an enumerator or use the built-in function to convert the HashSet into a List and iterate through that. Take a look here



Related Topics



Leave a reply



Submit