How to Get All Possible Subsets of a Character Vector in R

How to get all possible subsets of a character vector in R?

You can use combn:

res <- unlist(lapply(1:3, combn, 
x = c("test1","test2","test3"), simplify = FALSE),
recursive = FALSE)
res <- sapply(res, `length<-`, 3)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#[1,] "test1" "test2" "test3" "test1" "test1" "test2" "test1"
#[2,] NA NA NA "test2" "test3" "test3" "test2"
#[3,] NA NA NA NA NA NA "test3"

R: subset of character vector

You can also do something like this:

vector <- c("a", "", "b", "c","","d", "e")
vector[seq(which(vector=="b")+1,which(vector=="e")-1)]
#[1] "c" "" "d"

All possible combinations in a vector

Try:

unlist(sapply(0:length(vec),function(n) apply(combn(vec,n),2,function(v) paste0(v,collapse="+"))))

[1] "" "A" "B" "C" "A+B" "A+C" "B+C" "A+B+C"

How do I find all possible subsets of a set iteratively in R?

No need for loops (lapply or otherwise):

combn(1:4,2)
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 1 1 2 2 3
# [2,] 2 3 4 3 4 4

Example with calculating the sums of combinations:

combn(1:4,2,FUN=sum)
# [1] 3 4 5 5 6 7

An example with a user defined function:

x <- 11:14
combn(1:4,2,FUN=function(i,a) sum(a[i]),a=x)
#[1] 23 24 25 25 26 27

Here (in the anonymous function) i is the combination used as index and argument a is a vector to which I pass x.

And the same with a user-defined named function:

fun <- function(i,a) sum(a[i])
combn(1:4,2,FUN=fun,a=x)

How to subset a matrix using a large character vector

I am making a guess at the problem. Your genes are factors, and when you use them to subset a matrix, they are converted to numeric:

genes = c('EPHX1','HSPB1', 'CLU', 'GAMT','PICK1', 'NR3C1','SIRT1', 'NPAS2',
'SPRY4', 'MAP3K1', 'SOS1', 'SALL4','GRIP1', 'PUM2', 'SOX9', 'RIPK4', 'CHD7',
'BCOR','CCNB1','NFE2L2', 'CHD2', 'CYP1B1', 'MDM2', 'CREBBP', 'ICK', 'ZFY',
'SIN3A', 'GATA4')

class(genes)
[1] "character"

infertility = data.frame(V1=genes)
vector_infertility_genes <- infertility$V1

class(vector_infertility_genes)
[1] "factor"

By default, the data.frame has characters as a vector, now below I make a matrix with some random gene names, and insert the chosen genes from 101-128:

my_genomic_matrix = matrix(runif(1000*3),ncol=3)
rownames(my_genomic_matrix) = paste0("gene",1:1000)
rownames(my_genomic_matrix)[101:128] = genes

This gives you some weird thing:

head(my_genomic_matrix[vector_infertility_genes,])
[,1] [,2] [,3]
gene8 0.6705400 0.92836211 0.39245031
gene12 0.6550523 0.87094037 0.08309788
gene5 0.3737798 0.94779178 0.44279510
gene9 0.4544450 0.77939541 0.13901245
gene19 0.6284895 0.47871950 0.60837784
gene18 0.2369957 0.01336282 0.10390174

This should work in most cases, as long as you are sure your vector_infertility_genes are in the row names of my_genomic_matrix:

head(my_genomic_matrix[as.character(vector_infertility_genes),])
[,1] [,2] [,3]
EPHX1 0.1380852 0.91638593 0.5155086
HSPB1 0.4828377 0.44798223 0.6011990
CLU 0.7974677 0.84083760 0.4378384
GAMT 0.9654133 0.04167125 0.6087020
PICK1 0.1958134 0.22254847 0.5157768
NR3C1 0.4228220 0.14512706 0.6136789

If some are missing you can also do:

vector_infertility_genes = as.character(vector_infertility_genes)
my_genomic_matrix[rownames(my_genomic_matrix) %in% vector_infertility_genes,]

subsets of different vectors R

We can create strings using expand.grid and combn. Create a combn of list ('lst') elements picking 2 or 3 in a list (using lapply), expand the list elements into a data.frame and paste with do.call (specifying the sep as " & ")

lst <- list(q w, t)
unlist( lapply(2:3, function(i) combn(lst, i,
FUN = function(x) do.call(paste, c(expand.grid(x), sep = " & ")),
simplify = FALSE)))

Unordered combinations of all lengths

You could apply a sequence the length of x over the m argument of the combn() function.

x <- c("red", "blue", "black")
do.call(c, lapply(seq_along(x), combn, x = x, simplify = FALSE))
# [[1]]
# [1] "red"
#
# [[2]]
# [1] "blue"
#
# [[3]]
# [1] "black"
#
# [[4]]
# [1] "red" "blue"
#
# [[5]]
# [1] "red" "black"
#
# [[6]]
# [1] "blue" "black"
#
# [[7]]
# [1] "red" "blue" "black"

If you prefer a matrix result, then you can apply stringi::stri_list2matrix() to the list above.

stringi::stri_list2matrix(
do.call(c, lapply(seq_along(x), combn, x = x, simplify = FALSE)),
byrow = TRUE
)
# [,1] [,2] [,3]
# [1,] "red" NA NA
# [2,] "blue" NA NA
# [3,] "black" NA NA
# [4,] "red" "blue" NA
# [5,] "red" "black" NA
# [6,] "blue" "black" NA
# [7,] "red" "blue" "black"


Related Topics



Leave a reply



Submit