How to Find Common Elements from Multiple Vectors

How to find common elements from multiple vectors?

There might be a cleverer way to go about this, but

intersect(intersect(a,b),c)

will do the job.

EDIT: More cleverly, and more conveniently if you have a lot of arguments:

Reduce(intersect, list(a,b,c))

Find common element between multiple vectors (no integer elements)

You could make them into hash table and count them out. As soon as you found them again, bump counter. If counter for particular item is the same as number of vectors, you got yourself an intersection. No need to pre-sort vector of pairs, define weak or string ordering etc.

Along the lines:

#include <iostream>
#include <vector>
#include <list>
#include <unordered_map>

using Qpair = uint32_t; // should be std::pair<int, int> or similar
using Qpairs = std::vector<Qpair>;

int intersections(const std::list<Qpairs>& allpairs) {
std::unordered_map<Qpair, int> m; // element vs counter

auto count = allpairs.size(); // number of vectors to scan

for(const auto& pairs: allpairs) { // loop over all vectors
for (const auto& p : pairs) { // loop over elements in particular vector
m[p] += 1; // and count them
}
}

int total_count = 0; // how many common elements are here
for (const auto& e : m) {
if (e.second == count) {
++total_count;
// you could add e.first to output vector as well
}
}
return total_count;
}

int main() {
Qpairs v1{ 4, 2, 6, 8, 9 };
Qpairs v2{ 1, 3, 8, 9, 4 };
Qpairs v3{ 2, 8, 9, 5, 0 };

std::list<Qpairs> l{ v1, v2, v3 };

auto q = intersections(l);

std::cout << q << '\n';

return 0;
}

R: how to find common elements with the same indices in multiple vectors

We can do this using a simple comparison check:

x == y

and subsetting x by it: x[x==y]. Then the question is how to best loop it over the combinations.

Here, I'll use outer to take the all by all output of each combination of the list of vectors, and call a Vectorized anonymous function on it.

v1 <- c(1, 99, 10, 11, 23)
v2 <- c(1, 99, 10, 23, 11)
v3 <- c(2, 4, 10, 13, 23)

l = list(v1,v2,v3)


outer(l,l,Vectorize(function(x,y){x[x==y]}))

[,1] [,2] [,3]
[1,] Numeric,5 Numeric,3 Numeric,2
[2,] Numeric,3 Numeric,5 10
[3,] Numeric,2 10 Numeric,5

if you look in the output matrix, each cell is the overlap of the indexed lists:

output[1,2]
[[1]]
[1] 1 99 10

Find the common elements from multiple vectors which appear at least in percentage of them

I think this would work. We use the table function to do most of the heavy lifting.

find_perc <- function(..., perc = .75){
list_len <- length(list(...)) # how many vectors
tab_it <- table(c(...)) # tabulate all the names
tab_it_perc <- tab_it / list_len # calculate the frequencies
names(tab_it_perc[tab_it_perc >= perc]) # return those with freq >= perc
}


> find_perc(a, b, c, d)
[1] "Greg" "Mark" "Mathew"
> find_perc(a, b, c, d, perc = .5)
[1] "Greg" "Igor" "Kate" "Mark" "Mary" "Mathew" "Robin" "Tobias"

How to find common elements from multiple vectors and from a matrix?

We can use Map

 Map(intersect, split(c, row(c)), list(intersect(a,b)))

Finding Index of Common Elements in Multiple Vectors in R

a <- c(5,2); b <- c(5,3); d <- c(4,5)
mylist = list(a = a, b = b, d = d) #OR mylist = mget(c("a", "b", "d"))
common_values = Reduce(intersect, mylist)
lapply(mylist, function(x) which(x %in% common_values))
#$a
#[1] 1

#$b
#[1] 1

#$d
#[1] 2

It is not clear how you want to address when there can be more than one common value, but here is one way

a = 1:3
b = 2:4
d = c(2, 7, 3, 5)
mylist = mget(c("a", "b", "d"))
common_values = Reduce(intersect, mylist)
lapply(mylist, function(x)
sapply(setNames(common_values, common_values), function(y)
which(x %in% y)))
#$a
#2 3
#2 3

#$b
#2 3
#1 2

#$d
#2 3
#1 3

How to find elements common in at least 2 vectors?

It is much simpler than a lot of people are making it look. This should be very efficient.

  1. Put everything into a vector:

    x <- unlist(list(a, b, c, d, e))
  2. Look for duplicates

    unique(x[duplicated(x)])
    # [1] 2 3 1 4 8

and sort if needed.

Note: In case there can be duplicates within a list element (which your example does not seem to implicate), then replace x with x <- unlist(lapply(list(a, b, c, d, e), unique))


Edit: as the OP has expressed interest in a more general solution where n >= 2, I would do:

which(tabulate(x) >= n)

if the data is only made of natural integers (1, 2, etc.) as in the example. If not:

f <- table(x)
names(f)[f >= n]

This is now not too far from James solution but it avoids the costly-ish sort. And it is miles faster than computing all possible combinations.



Related Topics



Leave a reply



Submit