Vector Intersection in C++

Vector intersection in C++

Try std::set_intersection, for example:

#include <algorithm> //std::sort
#include <iostream> //std::cout
#include <string> //std::string
#include <vector> //std::vector

std::vector<std::string> intersection(std::vector<std::string> v1,
std::vector<std::string> v2){
std::vector<std::string> v3;

std::sort(v1.begin(), v1.end());
std::sort(v2.begin(), v2.end());

std::set_intersection(v1.begin(),v1.end(),
v2.begin(),v2.end(),
back_inserter(v3));
return v3;
}

int main(){
std::vector<std::string> v1 {"a","b","c"};
std::vector<std::string> v2 {"b","c"};

auto v3 = intersection(v1, v2);

for(std::string n : v3)
std::cout << n << ' ';
}

performing vector intersection in C++

I don't have a framework to profile the operations but I'd certainly change the code to reuse the readily allocated vector. In addition, I'd hoist the initial intersection out of the loop. Also, std::back_inserter() should make sure that elements are added in the correct location rather than in the beginning:

int func()
{
vector<vector<unsigned> > t = some_initialization();
if (t.empty()) {
return;
}
vector<unsigned> intersectedValues(t[0]);
vector<unsigned> tempIntersectedSubjects;
for (std::vector<std::vector<unsigned>>::size_type i(1u);
i < t.size() && !intersectedValues.empty(); ++i) {
std::set_intersection(t[i].begin(), t[i].end(),
intersectedValues.begin(), intersectedValues.end(),
std::back_inserter(tempIntersectedSubjects);
std::swap(intersectedValues, tempIntersectedSubjects);
tempIntersectedSubjects.clear();
}
}

I think this code has a fair chance to be faster. It may also be reasonable to intersect the sets different: instead of keeping one set and intersecting with that you could create a new intersection for pairs of adjacent sets and then intersect the first sets with their respect adjacent ones:

std::vector<std::vector<unsigned>> intersections(
std::vector<std::vector<unsigned>> const& t) {
std::vector<std::vector<unsigned>> r;
std::vector<std::vector<unsignned>>::size_type i(0);
for (; i + 1 < t.size(); i += 2) {
r.push_back(intersect(t[i], t[i + 1]));
}
if (i < t.size()) {
r.push_back(t[i]);
}
return r;
}

std::vector<unsigned> func(std::vector<std::vector<unsigned>> const& t) {
if (t.empty()) { /* deal with t being empty... */ }
std::vector<std::vector<unsigned>> r(intersections(t))
return r.size() == 1? r[0]: func(r);
}

Of course, you wouldn't really implement it like this: you'd use Stepanov's binary counter to keep the intermediate sets. This approach assumes that the result is most likely non-empty. If the expectation is that the result will be empty that may not be an improvement.

intersection of n vectors

Fortunately, I think a much tighter bound can be placed on the
complexity of your algorithm.

The complexity of std::set_intersection on input sets of size n1 and n2 is
O(n1 + n2).
You could take your original vectors and intersect them in single-elimination
tournament style, that is, on the first round you intersect the 1st and 2nd
vectors, the 3rd and 4th, the 5th and 6th, and so forth; on the
second round you intersect the 1st and 2nd intersections, the 3rd and 4th,
and so forth; repeat until the final round produces just one intersection.
The sum of the sizes of all the vectors surviving each round is no more than
half the sum of the sizes of the vectors at the start of the round,
so this algorithm takes O(N) time (also O(N) space) altogether
where N is the sum of the sizes of all the original vectors in your input.
(It's O(N) because N + N/2 + N/4 + ... < 2N.)

So, given an input consisting of already-sorted vectors,
the complexity of the algorithm is O(N).

Your algorithm merges the vectors in a very different sequence,
but while I'm not 100% sure it is also O(N), I strongly suspect that it is.


Edit:
Concerning how to actually implement the "tournament" algorithm in C++,
it depends on how hard you want to work to optimize this,
and somewhat on the nature of your input.

The easiest approach would be to make a new list of vectors; take two vectors from the old list, push a vector onto the new list, merge the two old vectors onto the new vector, destroy the old vectors, hope the library manages the memory efficiently.

If you want to reduce the allocation of new vectors, then re-using vectors
(as you already thought to do) might help. If the input data structure is
an std::list<std::vector<int> >, for example, you could start by pushing one empty vector onto the front of this list. Make three iterators, one to the new vector, and one to each of the original first two vectors in the list.
Take the intersection of the vectors at the last two iterators,
writing the result to the first iterator, then clear the vectors at the
last two iterators. Move the last two iterators forward two places each,
move the first iterator forward one place. Repeat. If you reach a state where
one of the last two iterators has reached end() but the other has not,
erase all the list elements between the first iterator and the other iterator.
Now you have a list of vectors again and can repeat as long as there is
more than one vector in the list.

If the input is std::vector<std::vector<int> > then pushing an element
onto the front of the list is relatively expensive, so you might want a
slightly more complicated algorithm. There are lots of choices, no really
obvious winners I can think of.

Intersection of vectors

  1. Sort the vectors.
  2. Use std::set_intersection() three times (as many vectors you have - 1).

Time Complexity Analysis:

  • 4 * O(nlogn) = O(nlogn)
  • linear in 2 * (firstSize + secondSize) - 1
  • linear in 2 * (firstSecondInterSize + thirdSize) - 1
  • linear in 2 * (firstSecondThirdInterSize + fourthSize) - 1

which is bounded by O(nlogn), meaning that sorting is the bottleneck of the algorithm.


Full code example:

#include <iostream>     // std::cout
#include <algorithm> // std::set_intersection, std::sort
#include <vector> // std::vector

int main () {
std::vector<int> first = {5,10,15,20,25};
std::vector<int> second = {50,40,30,20,10};
std::vector<int> third = {10,20,3,4,0};
std::vector<int> fourth = {4,20,10,3,6};
std::vector<int> v(first.size() + second.size() + third.size() + fourth.size());
std::vector<int>::iterator it;

std::sort (first.begin(),first.end());
std::sort (second.begin(),second.end());
std::sort (third.begin(),third.end());
std::sort (fourth.begin(),fourth.end());

it=std::set_intersection (first.begin(), first.end(), second.begin(), second.end(), v.begin());
it=std::set_intersection (v.begin(), v.end(), third.begin(), third.end(), v.begin());
it=std::set_intersection (v.begin(), v.end(), fourth.begin(), fourth.end(), v.begin());
v.resize(it-v.begin());

std::cout << "The intersection has " << (v.size()) << " elements: ";
for (it=v.begin(); it!=v.end(); ++it)
std::cout << ' ' << *it;
std::cout << '\n';

return 0;
}

Output:

The intersection has 2 elements: 10 20

Finding the Intersection between two Vectors

I'm assuming you do not want to use the C++ standard algorithm.

Couple of things.

1. Your algorithm shall work only if both vectors are initially sorted. Are they?

2. You shouldn't access vector[vector.size()] element as it is out of bounds.

My second point means:

while(i <= n1 && j <= n2)

Change this to

while(i < n1 && j < n2)

And change the following

for(unsigned int i = 0; i <= intersection.size(); i++)

to

for(unsigned int i = 0; i < intersection.size(); i++)

Also, MOST IMPORTANT ERROR!!!!

for(size_t k = 0; k < v.at(i).size(); k++)
{
vec.push_back(k);

}

Change it to:

for(size_t k = 0; k < v[i].size(); k++)
{
vec.push_back(v[i][k]);

}

C++: How to use set_intersection on two vectors containing user-defined structs?

First of all, when you are programming in C++, you can just use:

struct Face {
uint a,b,c;
};

Here's a simple strategy for implementing operator< that works for the algorithms and containers in the standard library.

struct Face {
uint a,b,c;

bool operator<(Face const& rhs) const
{
if ( a != rhs.a )
{
return ( a < rhs.a);
}
if ( b != rhs.b )
{
return ( b < rhs.b);
}
return ( c < rhs.c);
}
};

or, as suggested by @Praetorian,

struct Face {
uint a,b,c;

bool operator<(Face const& rhs) const
{
return std::tie(a, b, c) < std::tie(rhs.a, rhs.b, rhs.c);
}
};

Efficient way to find intersection of two vectors with respect to two members of vector objects

Your two vectors are sorted already – perfect!

First, assuming a comparison function (with up-coming C++20, this would get the space-ship operator...):

int compare(foo const& l, foo const& r)
{
return l.x != r.x ? l.x - r.x : l.y - r.y;
}

Now you can use it in the algorithm:

auto i1 = v1.begin();
auto i2 = v2.begin();

auto end1 = i1;
auto end2 = i2;

while(i1 != v1.end() && i2 != v2.end())
{
int cmp = compare(*i1, *i2);
if(cmp < 0)
{
// skip element
++i1;
}
else if(cmp > 0)
{
++i2;
}
else
{
// matching element found, keep in both vectors...
if(i1 != end1)
*end1 = std::move(*i1);
++i1;
++end1;
if(i2 != end2)
*end2 = std::move(*i2);
++i2;
++end2;

// if you can rely on move (or fallback copy) assignment
// checking for self assignment, the following simpler
// alternative can be used instead:

//*end1++ = std::move(*i1++);
//*end2++ = std::move(*i2++);
}
}
v1.erase(end1, v1.end());
v2.erase(end2, v2.end());

Linear in both vectors...

The algorithm just moves the elements to be kept to front and finally drops all the overdue ones – similarly as would std::remove_if do...



Related Topics



Leave a reply



Submit