Combinations of Multiple Vectors in R

Combinations of multiple vectors in R

This is a useful case for storing the vectors in a list and using do.call() to arrange for an appropriate function call for you. expand.grid() is the standard function you want. But so you don't have to type out or name individual vectors, try:

> l <- list(a = 1:2, b = 3:4, c = 2:3)
> do.call(expand.grid, l)
a b c
1 1 3 2
2 2 3 2
3 1 4 2
4 2 4 2
5 1 3 3
6 2 3 3
7 1 4 3
8 2 4 3

However, for all my cleverness, it turns out that expand.grid() accepts a list:

> expand.grid(l)
a b c
1 1 3 2
2 2 3 2
3 1 4 2
4 2 4 2
5 1 3 3
6 2 3 3
7 1 4 3
8 2 4 3

R function to see if combinations of multiple vectors match a target vector

The solutions shown in the question consist of non-overlapping vectors so we assume that that is a requirement so that we are looking to partition the target into disjoint vectors that cover it. If the vectors may overlap then instead of using = or == in the constraints involving A below use >=.

The assumed problem is known as a set partitioning problem and the problem with overlaps is known as a set covering problem.

Assuming the list of vectors L and the target shown in the Note at the end form the objective (all one's), incidence matrix A of vectors, animals and the right hand of the constraint equations rhs derived from the target and run the linear program shown.

If a solution is found then we add a constraint that will eliminate it in the next iteration by insisting that at least one of its zeros be one. We iterate 5 times (i.e. up to 5 solutions) or until we can find no more solutions.

We show a solution using the lpSolveAPI package and then in the section after that repeat it using the CVXR package.

lpSolveAPI

library(lpSolveAPI)

animals <- sort(unique(unlist(L)))
A <- +outer(animals, L, Vectorize(`%in%`))
rownames(A) <- animals
nr <- nrow(A)
nc <- ncol(A)

rhs <- rownames(A) %in% target

lp <- make.lp(nr, nc)
set.objfn(lp, rep(1, nc))
for(i in 1:nr) add.constraint(lp, A[i, ], "=", rhs[i])
for(j in 1:nc) set.type(lp, j, type = "binary")

soln <- solns <- NULL
for(s in 1:5) {
if (!is.null(soln)) add.constraint(lp, 1-soln, ">=", 1)
if (solve(lp) != 0) break
soln <- get.variables(lp)
solns <- c(solns, list(names(L)[soln == 1]))
}
solns
## [[1]]
## [1] "e" "f"
##
## [[2]]
## [1] "b" "d" "e"

CVXR

An alternative to lpSolve is CVXR. We use nc, A and rhs from above. Below we find up to 5 solutions.

library(CVXR)

x <- Variable(nc, boolean = TRUE)
objective <- Minimize(sum(x))
constraints <- list(A %*% x == matrix(rhs))

solns <- soln <- NULL
for(i in 1:5) {
if (!is.null(soln)) constraints <- c(constraints, sum((1 - soln) * x) >= 1)
prob <- Problem(objective, constraints)
result <- solve(prob)
if (result$status != "optimal") break
soln <- result$getValue(x)
solns <- c(solns, list(names(L)[soln == 1]))
}
solns
## [[1]]
## [1] "e" "f"
##
## [[2]]
## [1] "b" "d" "e"

Note

L <- within(list(), {
a <- c("giraffe", "dolphin", "pig")
b <- c("elephant" , "pig")
c <- c("zebra","cobra","spider","porcupine")
d <- c("porcupine")
e <- c("spider","cobra")
f <- c("elephant","pig","porcupine")
})
L <- L[order(names(L))]
target<- c("elephant" , "pig","cobra","spider","porcupine")

Find a combination of multiple vectors to fill a bigger vector element-wisely in R (according to their indexes)

This is a difficult problem. Let us start by programmatically generating your vector indexes:

indexes <- list(
vec1.a = 2:4, vec1.b = 10:12, vec1.c = 17:20,
vec2.a = 8:10, vec2.b = 18:19,
vec3.a = 4:6, vec3.b =11:15 )

By using which along with diff and lapply, we have the following:

myOnes <- do.call(c, lapply(vec_list, function(x) {
temp <- which(x == 1)
ind <- c(1, which(diff(temp) > 1) + 1, length(temp) + 1)
lapply(1:(length(ind) - 1), function(y) {
temp[ind[y]:(ind[y + 1] - 1)]
})
}))

myOnes
$vec11
[1] 2 3 4

$vec12
[1] 10 11 12

$vec13
[1] 17 18 19 20

$vec21
[1] 8 9 10

$vec22
[1] 18 19

$vec31
[1] 4 5 6

$vec32
[1] 11 12 13 14 15

Now, we need to determine which vectors have no overlap with the other vectors. Since we are comparing these vectors in increasing order there is no need to recheck against an earlier vector as we will have already checked it (E.g. checking vec22 against vec13 is redundant as we will have already checked this combination when vec13 is the first vector being compared). Again, using functions from base R we have:

lenOnes <- length(myOnes)

noOverLap <- lapply(1:(lenOnes - 1), function(x) {
which(sapply((x + 1):lenOnes, function(y) {
length(intersect(myOnes[[x]], myOnes[[y]])) == 0
})) + x
})

noOverLap
[[1]] ## The first vector above i.e. vec11 only
[1] 2 3 4 5 7 ## overlaps the sixth vector i.e. vec31

[[2]]
[1] 3 5 6

[[3]]
[1] 4 6 7

[[4]] ## The fourth vector above i.e. vec21 doesn't
[1] 5 6 7 ## overlap any vector beyond the fourth one

[[5]]
[1] 6 7

[[6]]
[1] 7

This looks promising and it wasn't that bad. Now we need to somehow leverage this newly created list to find all networks of non overlapping vectors. This has recursion written all over it.

myList <- vector("list")
n <- 0

## helper function for adding elements to our list
## ... this keeps our recursion function cleaner
addToList <- function(v) {
if (n == 0) {
myList[[n <<- n + 1]] <<- v
} else if (!isTRUE(all.equal(v, myList[[n]]))) {
myList[[n <<- n + 1]] <<- v
}
}

recurse <- function(v, x, z) {
if (x <= length(noOverLap)) {
b <- intersect(z, noOverLap[[x]])
if (length(b) > 0)
for (i in b)
recurse(c(v, i), i, b)
} else {
addToList(v)
}
addToList(v)
}

You will also note that we make use of the <<- (i.e. scoping assignment) in order to update our list.

Running our recursion function over every index of our list noOverLap, we obtain every combination of vectors that do not overlap:

for (i in seq_along(noOverLap))
recurse(i, i, noOverLap[[i]])

head(myList)
[[1]]
[1] 1 2 3

[[2]]
[1] 1 2 5

[[3]]
[1] 1 2

[[4]]
[1] 1 3 4 7

[[5]]
[1] 1 3 4

[[6]]
[1] 1 3 7

## looking at some of the middle elements
myList[21:25] ## length(myList) is 43
[[1]]
[1] 2

[[2]]
[1] 3 4 6 7

[[3]]
[1] 3 4 6

[[4]]
[1] 3 4 7

[[5]]
[1] 3 4

The output above tells us every combination of vectors from myOnes that don't overlap.

Let's take a closer look at myList[[4]] to get an idea of what's going on:

myList[[4]]
[1] 1 3 4 7

myOnes[myList[[4]]] ## the 1st, 3rd, 4th, and 7th vectors of myOnes
$vec11
[1] 2 3 4

$vec13
[1] 17 18 19 20

$vec21
[1] 8 9 10

$vec32
[1] 11 12 13 14 15

To prove that there is no overlap, we can concatenate these vectors, sort them, take the diff and make sure there are no zero values.

unlist(myOnes[myList[[4]]])
vec111 vec112 vec113 vec131 vec132 vec133 vec134 vec211 vec212 vec213 vec321 vec322 vec323 vec324 vec325
2 3 4 17 18 19 20 8 9 10 11 12 13 14 15

sort(unlist(myOnes[myList[[4]]]))
vec111 vec112 vec113 vec211 vec212 vec213 vec321 vec322 vec323 vec324 vec325 vec131 vec132 vec133 vec134
2 3 4 8 9 10 11 12 13 14 15 17 18 19 20

diff(sort(unlist(myOnes[myList[[4]]])))
vec112 vec113 vec211 vec212 vec213 vec321 vec322 vec323 vec324 vec325 vec131 vec132 vec133 vec134
1 1 4 1 1 1 1 1 1 1 2 1 1 1

any(diff(sort(unlist(myOnes[myList[[4]]]))) == 0)
[1] FALSE

And finally to get our combinations of indices, we can do the following:

indCombos <- lapply(myList, function(x) unlist(myOnes[x]))

head(indCombos)
[[1]]
vec111 vec112 vec113 vec121 vec122 vec123 vec131 vec132 vec133 vec134
2 3 4 10 11 12 17 18 19 20

[[2]]
vec111 vec112 vec113 vec121 vec122 vec123 vec221 vec222
2 3 4 10 11 12 18 19

[[3]]
vec111 vec112 vec113 vec121 vec122 vec123
2 3 4 10 11 12

[[4]]
vec111 vec112 vec113 vec131 vec132 vec133 vec134 vec211 vec212 vec213 vec321 vec322 vec323 vec324 vec325
2 3 4 17 18 19 20 8 9 10 11 12 13 14 15

[[5]]
vec111 vec112 vec113 vec131 vec132 vec133 vec134 vec211 vec212 vec213
2 3 4 17 18 19 20 8 9 10

[[6]]
vec111 vec112 vec113 vec131 vec132 vec133 vec134 vec321 vec322 vec323 vec324 vec325
2 3 4 17 18 19 20 11 12 13 14 15

R: How to Intersect multiple vectors that gives all possible combination

I would suggest saving the sets in a list, then you could iterate over the elements of the list, e.g.:

sets2intersect <- list(set1, set2, set3,set4,set5)

lapply(unique(unlist(sets2intersect)), function(i){
which(sapply(sets2intersect, function(x) any(i == x)))
})

[1]]
[1] 1 2 4

[[2]]
[1] 1 2

[[3]]
[1] 1 4

[[4]]
[1] 1 3

[[5]]
[1] 2

[[6]]
[1] 3

[[7]]
[1] 5

If you want to rename your list, to know which element was used, you can do:

result<- lapply(unique(unlist(sets2intersect)), function(i){
which(sapply(sets2intersect, function(x) any(i == x)))
})
names(result) <- unique(unlist(sets2intersect))

$g1
[1] 1 2 4

$g2
[1] 1 2

$g3
[1] 1 4

$g4
[1] 1 3

$g8
[1] 2

$g17
[1] 3

$g5
[1] 5

Combinations of varying number of elements from two vectors

A completely general solution that allows for any input vectors and the maximum / minimum numbers of characters taken from each would be:

comb <- function(a, b, min_a = 1, max_a = 1, min_b = 1, max_b = 2) {
as <- do.call(c, lapply(min_a:max_a, \(i) combn(a, i, simplify = FALSE)))
bs <- do.call(c, lapply(min_b:max_b, \(i) combn(b, i, simplify = FALSE)))
apply(expand.grid(as, bs), 1, unlist, use.names = FALSE)
}

By default this would give:

comb(group1, group2)
#> [[1]]
#> [1] "a" "c"
#>
#> [[2]]
#> [1] "b" "c"
#>
#> [[3]]
#> [1] "a" "d"
#>
#> [[4]]
#> [1] "b" "d"
#>
#> [[5]]
#> [1] "a" "e"
#>
#> [[6]]
#> [1] "b" "e"
#>
#> [[7]]
#> [1] "a" "c" "d"
#>
#> [[8]]
#> [1] "b" "c" "d"
#>
#> [[9]]
#> [1] "a" "c" "e"
#>
#> [[10]]
#> [1] "b" "c" "e"
#>
#> [[11]]
#> [1] "a" "d" "e"
#>
#> [[12]]
#> [1] "b" "d" "e"

Created on 2022-09-03 with reprex v2.0.2

Generate all combinations of vector with consecutive occurrences is considered as single occurrence

Here's an approach using the very fast arrangements package for permutations. We calculate the permutations of integers corresponding to the unique elements of the input and then do some clever indexing to output the corresponding swaps. This is extremely fast on small examples and does pretty well on larger example - on my computer it took a little less than 7 seconds to generate the 10! = 3628800 swaps on input of size 30 with 10 unique elements. The results are conveniently returned in a list.

library(arrangements)

all_swaps = function(x) {
ux = unique(x)
xi = as.integer(factor(x))
perm = permutations(seq_along(ux))
apply(perm, MARGIN = 1, FUN = \(p) ux[p][xi], simplify = FALSE)
}

Test cases from the question:

# n = 2
all_swaps(c("a","a","a","b","b","b","a","a","b","b"))
# [[1]]
# [1] "a" "a" "a" "b" "b" "b" "a" "a" "b" "b"
#
# [[2]]
# [1] "b" "b" "b" "a" "a" "a" "b" "b" "a" "a"

## n = 3
all_swaps(c("a","a","a","b","b","b","c","c","c"))
# [[1]]
# [1] "a" "a" "a" "b" "b" "b" "c" "c" "c"
#
# [[2]]
# [1] "a" "a" "a" "c" "c" "c" "b" "b" "b"
#
# [[3]]
# [1] "b" "b" "b" "a" "a" "a" "c" "c" "c"
#
# [[4]]
# [1] "b" "b" "b" "c" "c" "c" "a" "a" "a"
#
# [[5]]
# [1] "c" "c" "c" "a" "a" "a" "b" "b" "b"
#
# [[6]]
# [1] "c" "c" "c" "b" "b" "b" "a" "a" "a"

A shorter demo with 3 unique elements in a "complex" case where the elements are not all consecutive:

all_swaps(c("a", "b", "b", "c", "b"))
# [[1]]
# [1] "a" "b" "b" "c" "b"
#
# [[2]]
# [1] "a" "c" "c" "b" "c"
#
# [[3]]
# [1] "b" "a" "a" "c" "a"
#
# [[4]]
# [1] "b" "c" "c" "a" "c"
#
# [[5]]
# [1] "c" "a" "a" "b" "a"
#
# [[6]]
# [1] "c" "b" "b" "a" "b"

A larger case:

# n = 10
set.seed(47)
start_t = Sys.time()
n10 = all_swaps(sample(letters[1:10], size = 30, replace = TRUE))
end_t = Sys.time()
end_t - start_t
# Time difference of 6.711215 secs
length(n10)
# [1] 3628800


Benchmarking

Benchmarking my answer with Maël's and ThomasIsCoding's, my method relying on the arrangements package is quick and memory efficient. ThomasIsCoding's answer can be improved by changing from pracma::perms to arrangements::permutations--the memory usage is especially improved--but my version still performs better. Maël's uses a lot of time and memory. I'll lead with results, code to reproduce is below.

## 5 Unique Elements
arrange(b5, desc(`itr/sec`))
# # A tibble: 4 × 13
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
# 1 GregorThomas 2.31ms 12.6ms 77.5 5.77KB 0 40 0 516ms
# 2 ThomasIsCodingArr(in5) 9.3ms 20.5ms 47.4 19.55KB 0 24 0 506ms
# 3 ThomasIsCoding(in5) 12.57ms 22.7ms 41.2 45.41KB 0 22 0 534ms
# 4 Mael 963.64ms 963.6ms 1.04 1.24MB 0 1 0 964ms
# # … with 4 more variables: result <list>, memory <list>, time <list>, gc <list>

## 9 Unique Elements - memory allocation is important
arrange(b9, desc(`itr/sec`))
# # A tibble: 2 × 13
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list>
# 1 GregorThomas 1.8s 1.8s 0.556 27.7MB 0 1 0 1.8s <NULL>
# 2 ThomasIsCoding(in9) 2.5s 2.5s 0.400 230.8MB 0.400 1 1 2.5s <NULL>
# # … with 3 more variables: memory <list>, time <list>, gc <list>

Benchmarking code:

## Functions
library(arrangements)
library(pracma)
ThomasIsCoding <- function(x) {
idx <- match(x, unique(x))
m <- asplit(matrix(unique(x)[perms(1:max(idx))], ncol = max(idx)), 1)
Map(`[`, m, list(idx))
}
ThomasIsCodingArr <- function(x) {
idx <- match(x, unique(x))
m <- asplit(matrix(unique(x)[permutations(1:max(idx))], ncol = max(idx)), 1)
Map(`[`, m, list(idx))
}
Mael <- function(vec){
uni <- unique(vec)
size <- length(uni)
pVec <- paste(uni, collapse = "")
grid <- expand.grid(rep(list(uni), size))
expanded <- grid[apply(grid, 1, function(x) length(unique(x))) == size,]
p <- unname(apply(expanded, 1, paste0, collapse = ""))

lapply(p, function(x) chartr(pVec, x, vec))
}
all_swaps = function(x) {
ux = unique(x)
xi = as.integer(factor(x))
perm = permutations(seq_along(ux))
apply(perm, MARGIN = 1, FUN = \(p) ux[p][xi], simplify = FALSE)
}

set.seed(47)
in5 = c(sample(letters[1:5], 5), sample(letters[1:5], 5, replace = TRUE))

b5 = bench::mark(
GregorThomas = all_swaps(in5),
Mael = Mael(in5),
ThomasIsCoding(in5),
ThomasIsCodingArr(in5),
check = FALSE
)

All Possible Pairs between Two Vectors in R Without Replacement

It seems a permutation problem, which might be solved like below

> library(pracma)

> paste0(v1, t(perms(v2)))
[1] "AZ" "BY" "CX" "AZ" "BX" "CY" "AY" "BZ" "CX" "AY" "BX" "CZ" "AX" "BY" "CZ"
[16] "AX" "BZ" "CY"

or

out <- data.frame(
Var1 = v1,
Var2 = c(t(perms(v2))),
Match = ceiling(seq(factorial(length(v2)) * length(v2)) / length(v1))
)

which gives

> out
Var1 Var2 Match
1 A Z 1
2 B Y 1
3 C X 1
4 A Z 2
5 B X 2
6 C Y 2
7 A Y 3
8 B Z 3
9 C X 3
10 A Y 4
11 B X 4
12 C Z 4
13 A X 5
14 B Y 5
15 C Z 5
16 A X 6
17 B Z 6
18 C Y 6


Related Topics



Leave a reply



Submit