How Is Set() Implemented

How is set() implemented?

According to this thread:

Indeed, CPython's sets are implemented as something like dictionaries
with dummy values (the keys being the members of the set), with some
optimization(s) that exploit this lack of values

So basically a set uses a hashtable as its underlying data structure. This explains the O(1) membership checking, since looking up an item in a hashtable is an O(1) operation, on average.

If you are so inclined you can even browse the CPython source code for set which, according to Achim Domma, was originally mostly a cut-and-paste from the dict implementation.

Note: Nowadays, set and dict's implementations have diverged significantly, so the precise behaviors (e.g. arbitrary order vs. insertion order) and performance in various use cases differs; they're still implemented in terms of hashtables, so average case lookup and insertion remains O(1), but set is no longer just "dict, but with dummy/omitted keys".

C - How to implement Set data structure?

There are multiple ways of implementing set (and map) functionality, for example:

  • tree-based approach (ordered traversal)
  • hash-based approach (unordered traversal)

Since you mentioned value-indexed arrays, let's try the hash-based approach which builds naturally on top of the value-indexed array technique.

Beware of the advantages and disadvantages of hash-based vs. tree-based approaches.

You can design a hash-set (a special case of hash-tables) of pointers to hashable PODs, with chaining, internally represented as a fixed-size array of buckets of hashables, where:

  • all hashables in a bucket have the same hash value
  • a bucket can be implemented as a dynamic array or linked list of hashables
  • a hashable's hash value is used to index into the array of buckets (hash-value-indexed array)
  • one or more of the hashables contained in the hash-set could be (a pointer to) another hash-set, or even to the hash-set itself (i.e. self-inclusion is possible)

With large amounts of memory at your disposal, you can size your array of buckets generously and, in combination with a good hash method, drastically reduce the probability of collision, achieving virtually constant-time performance.

You would have to implement:

  • the hash function for the type being hashed
  • an equality function for the type being used to test whether two hashables are equal or not
  • the hash-set contains/insert/remove functionality.

You can also use open addressing as an alternative to maintaining and managing buckets.

Set.of(E... elements) - which Set implementation is it using? and what is its relation to new HashSet ()?

Don’t know, don’t care

The concrete class used by Set.of (and List.of, Map.of) is not documented. All we know is that the object returned (a) implements the interface, and (b) is unmodifiable.

That is all we need to know. We should not care about the particular concrete class used under the covers.

Being of unknown concrete class gives freedom to the implementors of the of methods.

  • Those programmers are free to optimize according to the nature of your arguments. For example, if you are passing enum arguments, the highly optimized EnumSet class might be used behind the scenes.
  • Those programmers are free to change their concrete class between versions of Java. For example, Java 17 implementation might return one concrete class while Java 18 returns another.

So you should never depend on a particular concrete class being utilized by the of/copyOf methods.

You asked:

What is the real difference between those two?

In your first one, we know the concrete class. And the resulting set is modifiable.

In your second one, we do not know the concrete class, nor do we care about the concrete class. And the resulting set is unmodifiable.






















CodeConcrete ClassModifiable
new HashSet<>() {{ add("a"); add("b"); add("c"); }}knownmodifiable
Set.of( "a", "b", "c"unknownunmodifiable

How to implement a Set Data Structure in Java?

Set internally implements a map.So each value in a set is just a key in map.So its uniqueness in maintained.

Here is the link.So that you get clear idea how set works internally.
Also few stack Answers.
First , Second

What is the underlying data structure of a STL set in C++?

You could implement a binary search tree by first defining a Node struct:

struct Node
{
void *nodeData;
Node *leftChild;
Node *rightChild;
}

Then, you could define a root of the tree with another Node *rootNode;

The Wikipedia entry on Binary Search Tree has a pretty good example of how to implement an insert method, so I would also recommend checking that out.

In terms of duplicates, they are generally not allowed in sets, so you could either just discard that input, throw an exception, etc, depending on your specification.

Set implementation using hash table

Let's make sure you understand the try-catch first.

try {
// some code
assert(false);
}
catch (exception& e) {
assert(true);
}

This is a way of saying, "I expect some code to fail by throwing an exception." So if it doesn't throw an exception, we flag it by asserting false. If it does throw an exception, we catch it so our program doesn't crash, and assert true and move on.

Let's take the first test as an example.

Given an iterator it for an empty set, the test expects it.next() to throw an exception. However, as you found, it does not.

So first, let's figure out where your code should throw an exception. Well, it's obvious, here:

void SetIterator::next()
{
this->pos++;

while (this->set.elems[this->pos] == INT_MIN && this->pos < this->set.m)
this->pos++;
}

But how do we know that the iterator was built for an empty set? How would you do it? I don't know what this->set.m refers to since I don't have the rest of your code, but let's say it means the number of items in your set. Then this being zero would mean it's an empty set, right? So how about this:

void SetIterator::next()
{
if (this->s.m == 0) throw std::runtime_err;

this->pos++;

while (this->set.elems[this->pos] == INT_MIN && this->pos < this->set.m)
this->pos++;
}

I'm not saying this is the best solution. In fact, it's not. But it's okay to start with not-the-best solutions while you work your way through the rest of the problem, and eventually things will "click" and you'll realize better solutions.

Is there any Python implementation of set() with key argument?

Isn't that basically a dictionary, where the key is the result of applying the function to the value?

y = {math.floor(i): i for i in iterable}

How is Set.toString() implemented?

Set is an interface.

It cannot override methods.

You're using the HashSet class, which inherits AbstractCollection.toString()

Writing a Set implementation with a randomElement() method

The only thing I can think of is to replace your set with a HashMap that maps from your element to it's position in the arrayList. size, contains, add and random will be the same. For remove you would do as follows:

Find the element from the HashMap
Retrieve it's position in the arrayList
Remove the element from the HashMap
Swap the deleted element with the last element in the array
Modify the swapped element position in the HashMap
Delete the last element from the array //Now this is O(1)

What makes this work is that you don't need any particular order in your array, you just need a random access storage so changing the order or the data will not cause any problems as long as you keep it in sync with your hashmap.



Related Topics



Leave a reply



Submit