Understanding Boost::Disjoint_Sets

Understanding boost::disjoint_sets

What I can understand from the documentation :

Disjoint need to associate a rank and a parent (in the forest tree) to each element. Since you might want to work with any kind of data you may,for example, not always want to use a map for the parent: with integer an array is sufficient. You also need a rank foe each element (the rank needed for the union-find).

You'll need two "properties" :

  • one to associate an integer to each element (first template argument), the rank
  • one to associate an element to an other one (second template argument), the fathers

On an example :

std::vector<int>  rank (100);
std::vector<int> parent (100);
boost::disjoint_sets<int*,int*> ds(&rank[0], &parent[0]);

Arrays are used &rank[0], &parent[0] to the type in the template is int*

For a more complex example (using maps) you can look at Ugo's answer.

You are just giving to the algorithm two structures to store the data (rank/parent) he needs.

Boost Disjoint sets: How to retrieve sets?

I am confused about what the code in question is attempting to do exactly, but I can answer the general question "How do you retrieve the sets from boost's implementation of disjoint_sets?" Basically you use the the parent map that was constructed by the data structure.

The disjoints_sets data structures represents each set as the "children" of an arbitrary member of the set, call these arbitrary members the set representative. The parent property map built by the Boost library associates each element with the representative of the set it belongs to. To build the actual sets then, we need to essentially invert the parent map. We iterate over the parent map constructing a map from representative to elements, which is a one-to-many relationship so this map has vectors of elements as the values. The values of this map will be the sets we are looking for.

Code below. The following finds the connected components in
Sample Image

I use strings as elements just to get an example on StackOverflow that does not use integer items and that is complete. Note the get_unique_vertices function is just a artifact of the design of this code which builds the graph forest solely using the edges as input. You could do the below without this step if you already know the vertices or by using the disjoint_sets data structure itself to keep track of them. I did it this way to keep the actual usage of disjoint_sets as concise as possible:

#include "boost/pending/disjoint_sets.hpp"
#include "boost/property_map/property_map.hpp"
#include <iostream>
#include <tuple>
#include <unordered_set>
#include <unordered_map>

template<typename T>
using assoc_map = boost::associative_property_map<T>;
using rank_map = std::unordered_map<std::string, int>;
using parent_map = std::unordered_map<std::string, std::string>;
using disjoint_sets = boost::disjoint_sets<assoc_map<rank_map>, assoc_map<parent_map>>;

std::vector<std::string> get_unique_vertices(const std::vector<std::tuple<std::string, std::string>>& edges)
{
std::unordered_set<std::string> vertex_set;
std::vector<std::string> vertices;
vertices.reserve(2 * edges.size());
for (const auto [u, v] : edges) {
if (vertex_set.find(u) == vertex_set.end()) {
vertex_set.insert(u);
vertices.push_back(u);
}
if (vertex_set.find(v) == vertex_set.end()) {
vertex_set.insert(v);
vertices.push_back(v);
}
}
return vertices;
}

std::vector<std::vector<std::string>> find_connected_components(const std::vector<std::tuple<std::string, std::string>>& edges)
{
rank_map rank;
parent_map parent;
disjoint_sets ds( boost::make_assoc_property_map(rank), boost::make_assoc_property_map(parent));

// insert all the vertices as single sets
auto vertices = get_unique_vertices(edges);
for (const auto& v : vertices) {
ds.make_set(v);
}

// add each graph edge to the data structure
for (const auto [u, v] : edges) {
ds.link(u, v);
}

// build a map mapping representatives to set elements...
std::unordered_map<std::string, std::vector<std::string>> sets;
for (const auto& v : vertices) {
auto parent = ds.find_set(v);
sets[parent].push_back(v);
}

// return just the values from the above
std::vector<std::vector<std::string>> output(sets.size());
std::transform(sets.begin(), sets.end(), output.begin(),
[](const auto& key_val) {
return key_val.second;
}
);

return output;
}

int main()
{
std::vector<std::tuple<std::string, std::string>> edges = {
{"A" , "B"},
{"D" , "E"},
{"H" , "I"},
{"K" , "J"},
{"E" , "F"},
{"B" , "C"},
{"H" , "K"},
{"E" , "G"},
{"I" , "J"}
};

auto connected_components = find_connected_components(edges);
for (const auto& cc : connected_components) {
for (const auto& vertex : cc)
std::cout << vertex;
std::cout << "\n";
}
}

which yields the following output:

HIKJ
ABC
DEFG

Is the Union-Find (or Disjoint Set) data structure in STL?

It is not, but there is one in boost: http://www.boost.org/doc/libs/1_64_0/libs/disjoint_sets/disjoint_sets.html, so if you want an off-the-shelf implementation I'd recommend this.

Understanding a boost executor example

The function schedule_one_or_yield() was removed from the current source code of boost because it implemented busy waiting. This is in

https://github.com/boostorg/thread/issues/117

loop_executor::loop is currently:

void loop()
{
while (!closed())
{
schedule_one_or_yield();
}
while (try_executing_one())
{
}
}

The first loop repeatedly calls schedule_one_or_yield() which is simply

void schedule_one_or_yield()
{
if ( ! try_executing_one())
{
this_thread::yield();
}
}

current implementation loop_executor::loop of is

/**
* The main loop of the worker thread
*/
void loop()
{
while (execute_one(/*wait:*/true))
{
}
BOOST_ASSERT(closed());
while (try_executing_one())
{
}
}

source : https://github.com/boostorg/thread/blob/develop/include/boost/thread/executors/loop_executor.hpp

It was also removed in the example user_scheduler, the old version is in

https://github.com/mongodb/mongo/blob/master/src/third_party/boost-1.60.0/boost/thread/user_scheduler.hpp with schedule_one_or_yield() in line 63

the new version without schedule_one_or_yield() is in https://github.com/boostorg/thread/blob/develop/example/user_scheduler.cpp



Related Topics



Leave a reply



Submit