How do I find all possible subsets of a set iteratively in R?
No need for loops (lapply
or otherwise):
combn(1:4,2)
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 1 1 2 2 3
# [2,] 2 3 4 3 4 4
Example with calculating the sums of combinations:
combn(1:4,2,FUN=sum)
# [1] 3 4 5 5 6 7
An example with a user defined function:
x <- 11:14
combn(1:4,2,FUN=function(i,a) sum(a[i]),a=x)
#[1] 23 24 25 25 26 27
Here (in the anonymous function) i
is the combination used as index and argument a
is a vector to which I pass x
.
And the same with a user-defined named function:
fun <- function(i,a) sum(a[i])
combn(1:4,2,FUN=fun,a=x)
finding all possible subsets of a dataframe
You can use split
and paste
like so:
split(z, paste(z$b, z$c, z$d))
But the tricky part of your question is how to programmatically combine the variables in columns 2:end without knowing beforehand the number of columns, their names or values. We can use a function like below to paste the values by row in columns 2:end
apply(df, 1, function(i) paste(i[-1], collapse=""))
Now combine with split
split(z, apply(z, 1, function(i) paste(i[-1], collapse="")))
How to get all subsets of a set? (powerset)
The Python itertools
page has exactly a powerset
recipe for this:
from itertools import chain, combinations
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
Output:
>>> list(powerset("abcd"))
[(), ('a',), ('b',), ('c',), ('d',), ('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd'), ('a', 'b', 'c'), ('a', 'b', 'd'), ('a', 'c', 'd'), ('b', 'c', 'd'), ('a', 'b', 'c', 'd')]
If you don't like that empty tuple at the beginning, you can just change the range
statement to range(1, len(s)+1)
to avoid a 0-length combination.
How to iterate column values to find out all possible combinations in R?
This won't (can't) be fast for many IDs. If it is too slow, you need to parallelize or implement it in a compiled language (e.g., using Rcpp).
We sort vals
. We can then create all combination of two items grouped by ID. We exclude ID's with 1 item. Finally we tabulate the result.
library(data.table)
setDT(example)
setorder(example, id, vals)
example[, if (.N > 1) split(combn(vals, 2), 1:2), by = id][, .N, by = c("1", "2")]
# 1 2 N
# 1: a b 3
# 2: a c 2
# 3: a d 3
# 4: a e 1
# 5: b c 2
# 6: b d 2
# 7: b e 1
#<...>
All N Combinations of All Subsets
This is what I would do, with, e.g., s=1:2
:
1) Represent subsets with a 0/1 matrix for each element's membership.
subsets = as.matrix(do.call(expand.grid,replicate(length(s),0:1,simplify=FALSE)))
which gives
Var1 Var2
[1,] 0 0
[2,] 1 0
[3,] 0 1
[4,] 1 1
Here, the first row is the empty subset; the second, {1}; the third, {2}; and the fourth, {1,2}. To get the subset itself, use mysubset = s[subsets[row,]]
, where row
is the row of the subset you want.
2) Represent pairs of subsets as pairs of rows of the matrix:
pairs <- expand.grid(Row1=1:nrow(subsets),Row2=1:nrow(subsets))
which gives
Row1 Row2
1 1 1
2 2 1
3 3 1
4 4 1
5 1 2
6 2 2
7 3 2
8 4 2
9 1 3
10 2 3
11 3 3
12 4 3
13 1 4
14 2 4
15 3 4
16 4 4
Here, the fourteenth row corresponds to the second and fourth rows of subsets
, so {1} & {1,2}. This assumes the order of the pair matters (which is implicit in taking the Cartesian product). To recover the subsets, use mypairosubsets=lapply(pairs[p,],function(r) s[subsets[r,]])
where p
is the row of the pair you want.
Expanding beyond pairs to the P(s)^n
case (where P(s)
is the power set of s
) would look like
setsosets = as.matrix(do.call(expand.grid,replicate(n,1:nrow(subsets),simplify=FALSE)))
Here, each row will have a vector of numbers. Each number corresponds to a row in the subsets
matrix.
Making copies of the elements of s
is probably not necessary for whatever you are doing after this. However, you could do it from here by using lapply(1:nrow(pairs),function(p)lapply(pairs[p,],function(r) s[subsets[r,]]))
, which starts like...
[[1]]
[[1]]$Row1
integer(0)
[[1]]$Row2
integer(0)
[[2]]
[[2]]$Row1
[1] 1
[[2]]$Row2
integer(0)
How to find all subsets of a set in JavaScript? (Powerset of array)
Here is one more very elegant solution with no loops or recursion, only using the map and reduce array native functions.
const getAllSubsets = theArray => theArray.reduce( (subsets, value) => subsets.concat( subsets.map(set => [value,...set]) ), [[]] );
console.log(getAllSubsets([1,2,3]));
Finding all the subsets of a set
It's very simple to do this recursively. The basic idea is that for each element, the set of subsets can be divided equally into those that contain that element and those that don't, and those two sets are otherwise equal.
- For n=1, the set of subsets is {{}, {1}}
- For n>1, find the set of subsets of 1,...,n-1 and make two copies of it. For one of them, add n to each subset. Then take the union of the two copies.
Edit To make it crystal clear:
- The set of subsets of {1} is {{}, {1}}
- For {1, 2}, take {{}, {1}}, add 2 to each subset to get {{2}, {1, 2}} and take the union with {{}, {1}} to get {{}, {1}, {2}, {1, 2}}
- Repeat till you reach n
Related Topics
Existing Function to Combine Standard Deviations in R
The Fastest Way to Convert Numeric to Character in R
Adding Row to a Data Frame with Missing Values
Applying Function (Ks.Test) Between Two Data Frames Column-Wise in R
How to Uninstall R Completely from Os X
How to Manually Set Colours to a Categorical Variables Using Ggplot()
Run R Interactively from Rscript
R - Insert Row for Missing Monthly Data and Interpolate
How to Define "Hidden Global Variables" Inside R Packages
Trouble Getting Latest Version of Gdal on Ubuntu Running R
How to Calculate Euclidean Distance Between Two Matrices in R
Using If Else on a Dataframe Across Multiple Columns
Error in Xj[I]: Invalid Subscript Type 'List'
R:Binary Matrix for All Possible Unique Results
How to Create a Continuous Legend (Color Bar Style) for Scale_Alpha
How to Remove Certain Columns in Multiple Data Frames in R