How to get all possible subsets of a character vector in R?
You can use combn
:
res <- unlist(lapply(1:3, combn,
x = c("test1","test2","test3"), simplify = FALSE),
recursive = FALSE)
res <- sapply(res, `length<-`, 3)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#[1,] "test1" "test2" "test3" "test1" "test1" "test2" "test1"
#[2,] NA NA NA "test2" "test3" "test3" "test2"
#[3,] NA NA NA NA NA NA "test3"
R: subset of character vector
You can also do something like this:
vector <- c("a", "", "b", "c","","d", "e")
vector[seq(which(vector=="b")+1,which(vector=="e")-1)]
#[1] "c" "" "d"
All possible combinations in a vector
Try:
unlist(sapply(0:length(vec),function(n) apply(combn(vec,n),2,function(v) paste0(v,collapse="+"))))
[1] "" "A" "B" "C" "A+B" "A+C" "B+C" "A+B+C"
How do I find all possible subsets of a set iteratively in R?
No need for loops (lapply
or otherwise):
combn(1:4,2)
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 1 1 2 2 3
# [2,] 2 3 4 3 4 4
Example with calculating the sums of combinations:
combn(1:4,2,FUN=sum)
# [1] 3 4 5 5 6 7
An example with a user defined function:
x <- 11:14
combn(1:4,2,FUN=function(i,a) sum(a[i]),a=x)
#[1] 23 24 25 25 26 27
Here (in the anonymous function) i
is the combination used as index and argument a
is a vector to which I pass x
.
And the same with a user-defined named function:
fun <- function(i,a) sum(a[i])
combn(1:4,2,FUN=fun,a=x)
How to subset a matrix using a large character vector
I am making a guess at the problem. Your genes are factors, and when you use them to subset a matrix, they are converted to numeric:
genes = c('EPHX1','HSPB1', 'CLU', 'GAMT','PICK1', 'NR3C1','SIRT1', 'NPAS2',
'SPRY4', 'MAP3K1', 'SOS1', 'SALL4','GRIP1', 'PUM2', 'SOX9', 'RIPK4', 'CHD7',
'BCOR','CCNB1','NFE2L2', 'CHD2', 'CYP1B1', 'MDM2', 'CREBBP', 'ICK', 'ZFY',
'SIN3A', 'GATA4')
class(genes)
[1] "character"
infertility = data.frame(V1=genes)
vector_infertility_genes <- infertility$V1
class(vector_infertility_genes)
[1] "factor"
By default, the data.frame has characters as a vector, now below I make a matrix with some random gene names, and insert the chosen genes from 101-128:
my_genomic_matrix = matrix(runif(1000*3),ncol=3)
rownames(my_genomic_matrix) = paste0("gene",1:1000)
rownames(my_genomic_matrix)[101:128] = genes
This gives you some weird thing:
head(my_genomic_matrix[vector_infertility_genes,])
[,1] [,2] [,3]
gene8 0.6705400 0.92836211 0.39245031
gene12 0.6550523 0.87094037 0.08309788
gene5 0.3737798 0.94779178 0.44279510
gene9 0.4544450 0.77939541 0.13901245
gene19 0.6284895 0.47871950 0.60837784
gene18 0.2369957 0.01336282 0.10390174
This should work in most cases, as long as you are sure your vector_infertility_genes are in the row names of my_genomic_matrix:
head(my_genomic_matrix[as.character(vector_infertility_genes),])
[,1] [,2] [,3]
EPHX1 0.1380852 0.91638593 0.5155086
HSPB1 0.4828377 0.44798223 0.6011990
CLU 0.7974677 0.84083760 0.4378384
GAMT 0.9654133 0.04167125 0.6087020
PICK1 0.1958134 0.22254847 0.5157768
NR3C1 0.4228220 0.14512706 0.6136789
If some are missing you can also do:
vector_infertility_genes = as.character(vector_infertility_genes)
my_genomic_matrix[rownames(my_genomic_matrix) %in% vector_infertility_genes,]
subsets of different vectors R
We can create strings using expand.grid
and combn
. Create a combn
of list
('lst') elements picking 2 or 3 in a list
(using lapply
), expand the list
elements into a data.frame
and paste
with do.call
(specifying the sep
as " & "
)
lst <- list(q w, t)
unlist( lapply(2:3, function(i) combn(lst, i,
FUN = function(x) do.call(paste, c(expand.grid(x), sep = " & ")),
simplify = FALSE)))
Unordered combinations of all lengths
You could apply a sequence the length of x
over the m
argument of the combn()
function.
x <- c("red", "blue", "black")
do.call(c, lapply(seq_along(x), combn, x = x, simplify = FALSE))
# [[1]]
# [1] "red"
#
# [[2]]
# [1] "blue"
#
# [[3]]
# [1] "black"
#
# [[4]]
# [1] "red" "blue"
#
# [[5]]
# [1] "red" "black"
#
# [[6]]
# [1] "blue" "black"
#
# [[7]]
# [1] "red" "blue" "black"
If you prefer a matrix result, then you can apply stringi::stri_list2matrix()
to the list above.
stringi::stri_list2matrix(
do.call(c, lapply(seq_along(x), combn, x = x, simplify = FALSE)),
byrow = TRUE
)
# [,1] [,2] [,3]
# [1,] "red" NA NA
# [2,] "blue" NA NA
# [3,] "black" NA NA
# [4,] "red" "blue" NA
# [5,] "red" "black" NA
# [6,] "blue" "black" NA
# [7,] "red" "blue" "black"
Related Topics
How to Access Global/Outer Scope Variable from R Apply Function
Display HTML File in Shiny App
Reading Big Data with Fixed Width
Double Clustered Standard Errors for Panel Data
How to Read Data with Different Separators
Output a Good-Looking Matrix Using Rendertable()
Nested If Else Statements Over a Number of Columns
Weird Characters Added to First Column Name After Reading a Toad-Exported CSV File
Changing the Symbol in the Legend Key in Ggplot2
How to One-Hot-Encode Factor Variables with Data.Table
R 3.3.0 Installing a Package on Windows: Gcc Not Found Error
Using R to Download Zipped Data File, Extract, and Import .Csv