How to Perform Pairwise Operation Like '%In%' and Set Operations for a List of Vectors

How to perform pairwise operation like `%in%` and set operations for a list of vectors

We could use outer(x, y, FUN). x and y need not be a "numeric" input like numerical vector / matrix; a vector input like "list" / "matrix list" is also allowed.

For example, to apply pairwise "%in%" operation, we use

z <- outer(lst, lst, FUN = Vectorize("%in%", SIMPLIFY = FALSE, USE.NAMES = FALSE))
# vec1 vec2 vec3 vec4
#vec1 Logical,2 Logical,2 Logical,2 Logical,2
#vec2 Logical,3 Logical,3 Logical,3 Logical,3
#vec3 Logical,4 Logical,4 Logical,4 Logical,4
#vec4 Logical,5 Logical,5 Logical,5 Logical,5

Since "%in%" itself is not vectorized, we use Vectorized("%in%"). We also need SIMPLIFY = FALSE, so that FUN returns a length-1 list for each pair (x[[i]], y[[j]]). This is important, as outer works like:

y[[4]] | FUN(x[[1]], y[[4]])  FUN(x[[2]], y[[4]])  FUN(x[[1]], y[[4]])  FUN(x[[2]], y[[4]])
y[[3]] | FUN(x[[1]], y[[3]]) FUN(x[[2]], y[[3]]) FUN(x[[1]], y[[3]]) FUN(x[[2]], y[[4]])
y[[2]] | FUN(x[[1]], y[[2]]) FUN(x[[2]], y[[2]]) FUN(x[[1]], y[[2]]) FUN(x[[2]], y[[4]])
y[[1]] | FUN(x[[1]], y[[1]]) FUN(x[[2]], y[[1]]) FUN(x[[1]], y[[1]]) FUN(x[[2]], y[[4]])
------------------- ------------------- ------------------- -------------------
x[[1]] x[[2]] x[[3]] x[[4]]

It must be satisfied that length(FUN(x, y)) == length(x) * length(y). While if SIMPLIFY = FALSE, this does not necessarily hold.

The result z above is a "matrix list", with class(z) being "matrix", but typeof(z) being "list". Read Why is this matrix not numeric? for more.


If we want to further apply some summary function to each element of z, we could use lapply. Here I would offer two examples.

Example 1: Apply any()

Since any(a %in% b) is as same as any(b %in% a), i.e., the operation is symmetric, we only need to work with the lower triangular of z:

lz <- z[lower.tri(z)]

lapply returns an unnamed list, but for readability we want a named list. We may use matrix index (i, j) as name:

ind <- which(lower.tri(z), arr.ind = TRUE)
NAME <- paste(ind[,1], ind[,2], sep = ":")
any_lz <- setNames(lapply(lz, any), NAME)

#List of 6
# $ 2:1: logi FALSE
# $ 3:1: logi TRUE
# $ 4:1: logi TRUE
# $ 3:2: logi TRUE
# $ 4:2: logi FALSE
# $ 4:3: logi TRUE

Set operations like intersect, union and setequal are also symmetric operations which we can work with similarly.

Example 2: Apply which()

which(a %in% b) is not a symmetric operation, so we have to work with the full matrix.

NAME <- paste(1:nrow(z), rep(1:nrow(z), each = ncol(z)), sep = ":")
which_z <- setNames(lapply(z, which), NAME)

# List of 16
# $ 1:1: int [1:2] 1 2
# $ 2:1: int(0)
# $ 3:1: int [1:2] 1 2
# $ 4:1: int 3
# $ 1:2: int(0)
# $ 2:2: int [1:3] 1 2 3
# ...

Set operations like setdiff is also asymmetric and can be dealt with similarly.


Alternatives

Apart from using outer(), we could also use R expressions to obtain the z above. Again, I take binary operation "%in%" as an example:

op <- "'%in%'"    ## operator

lst_name <- names(lst)
op_call <- paste0(op, "(", lst_name, ", ", rep(lst_name, each = length(lst)), ")")
# [1] "'%in%'(vec1, vec1)" "'%in%'(vec2, vec1)" "'%in%'(vec3, vec1)"
# [4] "'%in%'(vec4, vec1)" "'%in%'(vec1, vec2)" "'%in%'(vec2, vec2)"
# ...

Then we can parse and evaluate these expressions within lst. We may use combination index for names of the resulting list:

NAME <- paste(1:length(lst), rep(1:length(lst), each = length(lst)), sep = ":")
z <- setNames(lapply(parse(text = op_call), eval, lst), NAME)

# List of 16
# $ 1:1: logi [1:2] TRUE TRUE
# $ 2:1: logi [1:3] FALSE FALSE FALSE
# $ 3:1: logi [1:4] TRUE TRUE FALSE FALSE
# $ 4:1: logi [1:5] FALSE FALSE TRUE FALSE FALSE
# $ 1:2: logi [1:2] FALSE FALSE
# ...

Apply a function over combinations of elements in a list

Let's just do

oo <- outer(matrix.list, matrix.list, Vectorize(crossprod, SIMPLIFY = FALSE))

which gives you a matrix list. Accessing the result is handy. oo[1,2] (actually get into the list oo[1,2][[1]]) gives the cross product between matrix 1 and matrix 2.

Note, matrix cross product is not %*% (but if you insist, use Vectorize("%*%", SIMPLIFY = FALSE)). As the operation is not symmetric, oo[1,2] is different from oo[2,1].

See How to perform pairwise operation like `%in%` and set operations for a list of vectors for the original idea of this.


Thanks for your response. To clarify: I want to calculate the dot product, not the cross product, as I stated in my initial question. However, the function itself doesn't matter - I want to know how to use combinations of elements from a list as inputs for any R function.

No idea of what you want. Give you a few other options and pick up yourself. They are all different.

This?

combn(matrix.list, 2, function (u) u[[1]] %*% u[[2]])

This?

mapply("%*%", matrix.list[-length(matrix.list)], matrix.list[-1], SIMPLIFY = FALSE)

R pairwise operations

combn is a work-horse here, which can be used to generate unique pairwise combinations:

combn(as.character(dat$Name), 2, simplify=FALSE)
#[[1]]
#[1] "A" "B"
#
#[[2]]
#[1] "A" "C"
#
#[[3]]
#[1] "B" "C"

You can also pass the results of these pairwise combinations to a function then:

# set.seed(1)
##for reproducibility

combn(
as.character(dat$Name),
2,
FUN=function(x) do.call(`-`, dat[dat$Name == x[1], -1] / dat[dat$Name == x[2], -1])
)
#[1] -8.2526585 2.6940335 0.1818427

AB3
#[1] -8.252659
AC3
#[1] 2.694033
BC3
#[1] 0.1818427

Pairwise operations (distance) on two lists in numpy

Here is a quick performance analysis of the four methods presented so far:

import numpy
import scipy
from itertools import product
from scipy.spatial.distance import cdist
from scipy.spatial import cKDTree as KDTree

n = 100
l1 = numpy.random.randint(0, 100, size=(n,3))
l2 = numpy.random.randint(0, 100, size=(n,3))

# by @Phillip
def a(l1,l2):
return min(numpy.linalg.norm(l1_element - l2_element) for l1_element,l2_element in product(l1,l2))

# by @Kasra
def b(l1,l2):
return numpy.min(numpy.apply_along_axis(
numpy.linalg.norm,
2,
l1[:, None, :] - l2[None, :, :]
))

# mine
def c(l1,l2):
return numpy.min(scipy.spatial.distance.cdist(l1,l2))

# just checking that numpy.min is indeed faster.
def c2(l1,l2):
return min(scipy.spatial.distance.cdist(l1,l2).reshape(-1))

# by @BrianLarsen
def d(l1,l2):
# make KDTrees for both sets of points
t1 = KDTree(l1)
t2 = KDTree(l2)
# we need a distance to not look beyond, if you have real knowledge use it, otherwise guess
maxD = numpy.linalg.norm(l1[0] - l2[0]) # this could be closest but anyhting further is certainly not
# get a sparce matrix of all the distances

ans = t1.sparse_distance_matrix(t2, maxD)

# get the minimum distance and points involved
minD = min(ans.values())
return minD

for x in (a,b,c,c2,d):
print("Timing variant", x.__name__, ':', flush=True)
print(x(l1,l2), flush=True)
%timeit x(l1,l2)
print(flush=True)

For n=100

Timing variant a :
2.2360679775
10 loops, best of 3: 90.3 ms per loop

Timing variant b :
2.2360679775
10 loops, best of 3: 151 ms per loop

Timing variant c :
2.2360679775
10000 loops, best of 3: 136 µs per loop

Timing variant c2 :
2.2360679775
1000 loops, best of 3: 844 µs per loop

Timing variant d :
2.2360679775
100 loops, best of 3: 3.62 ms per loop

For n=1000

Timing variant a :
0.0
1 loops, best of 3: 9.16 s per loop

Timing variant b :
0.0
1 loops, best of 3: 14.9 s per loop

Timing variant c :
0.0
100 loops, best of 3: 11 ms per loop

Timing variant c2 :
0.0
10 loops, best of 3: 80.3 ms per loop

Timing variant d :
0.0
1 loops, best of 3: 933 ms per loop

How do I perform a pairwise binary operation between the elements of two containers?

A lambda should do the trick:

#include <algorithm>
#include <iterator>

std::transform(a.begin(), a.end(), // first
b.begin(), // second
std::back_inserter(c), // output
[](uint32_t n, uint32_t m) { return n & m; } );

Even better, thanks to @Pavel and entirely C++98:

#include <functional>

std::transform(a.begin(), a.end(), b.begin(),
std::back_inserter(c), std::bit_and<uint32_t>());

Equivalent of `outer` with list or vector output

Just use FUN = Vectorize(rep.int, SIMPLIFY = FALSE) inside outer, to get a matrix list.

Related: How to perform pairwise operation like `%in%` and set operations for a list of vectors



Related Topics



Leave a reply



Submit