R dpylr select_if with multiple conditions
If we have a data frame, x
:
x = data.frame(V1=c(1,2,3),V2=c(10,11,12),V3=c('a','b','c'),V4=c('x','y','z'),V5=c('l', 'm','n'), stringsAsFactors=FALSE)
## V1 V2 V3 V4 V5
##1 1 10 a x l
##2 2 11 b y m
##3 3 12 c z n
where V1
and V2
are actually numeric
and the rest of the columns are not factors, then we can do:
library(dplyr)
y <- x %>% select_if(function(col) is.numeric(col) |
all(col == .$V4) |
all(col == .$V5))
## V1 V2 V4 V5
##1 1 10 x l
##2 2 11 y m
##3 3 12 z n
Not saying that this is the best thing to do, but it does do what you want. The issue here is that select_if
expects its function to return a boolean vector corresponding to all columns.
Another way is to use select
:
y <- x %>% select(which(sapply(.,class)=="numeric"),V4,V5)
## V1 V2 V4 V5
##1 1 10 x l
##2 2 11 y m
##3 3 12 z n
which is probably better.
Select columns based on multiple attribute conditions
Elegant tidyverse syntax where ~
stands for anonymous function may be helpful when using select_if
function:
require(tidyverse)
# numeric and character columns
starwars %>% select_if(~ is.numeric(.) | is.character(.))
# all numeric AND the name column
starwars %>% select(name, where(is.numeric))
Predicate functions e.g. is.numeric
inside of select
for some reason is recommended to be wrapped in where()
according to tidyverse creators.
Usings multiple conditions in select (dplyr)
You can use select_if
and grepl
library(dplyr)
df %>%
select_if(grepl("col", names(.)) & grepl(1, names(.)))
# column1
#1 1
#2 2
#3 3
#4 4
#5 5
#6 6
#7 7
#8 8
#9 9
#10 10
If you want to use select
with contains
you could do something like this:
df %>%
select(intersect(contains("col"), contains("1")))
This can be combined in other ways, as mentioned in the comments:
df %>%
select(intersect(contains("1"), starts_with("c")))
How to select columns depending on multiple conditions in dplyr
Inside where
, we need to supply functions that have logical results.
library(dplyr)
select(df1, \(x) all(x < 5))
# or this, which might be more semantically correct
select(df1, where(\(x) is.numeric(x) & all(x < 5)))
a1
1 1
2 0
3 3
4 0
Data
df1 <- structure(list(X = c("B", "C", "D", "E"), a1 = c(1, 0, 3, 0),
a2 = c(235, 270, 100, 1), a3 = c(3, 1000, 900, 2)), class = "data.frame", row.names = c(NA,
-4L))
How to select columns in r by multiple condition (not using select_if)?
We can use select
with where
library(dplyr) # // version 1.0.4
library(ggplot2movies)
out2 <- movies %>%
select(where(~ mean(is.na(.)) > .25))
-checking with OP's code
out1 <- movies %>%
select_if(~ sum(is.na(.))/length(.) > .25)
identical(out1, out2)
#[1] TRUE
sum(...)/n()
can be mean(...)
dplyr: conditional column selection using select_if()
The simplest and clearest way to do this is to pipe together 2 select
functions:
iris %>%
select_if(is.numeric) %>% # Select all numeric columns
select(-contains('Width')) %>% # Then drop 'Width' column(s)
head
Sepal.Length Petal.Length
1 5.1 1.4
2 4.9 1.4
3 4.7 1.3
4 4.6 1.5
5 5.0 1.4
6 5.4 1.7
This works even inside a map
function:
iris %>%
group_by(Species) %>%
nest %>%
mutate(data = map(data, ~ .x %>%
select_if(is.numeric) %>%
select(-contains('Width')) %>%
mutate(count = sum(rowSums(.))))) %>%
mutate(data = map(data, ~ .x %>%
select_if(is.numeric) %>%
select(-contains('Width')) %>%
mutate_all(funs((. / count) * 100 )))) %>%
unnest
# A tibble: 150 x 4
Species Sepal.Length Petal.Length count
<fct> <dbl> <dbl> <dbl>
1 setosa 1.58 0.433 100
2 setosa 1.52 0.433 100
3 setosa 1.45 0.402 100
4 setosa 1.42 0.464 100
5 setosa 1.55 0.433 100
6 setosa 1.67 0.526 100
7 setosa 1.42 0.433 100
8 setosa 1.55 0.464 100
9 setosa 1.36 0.433 100
10 setosa 1.52 0.464 100
# ... with 140 more rows
Dplyr _if verbs with predicate function referring to the column names & multiple conditions?
You can do:
btest %>%
select_if(str_detect(names(.), "jcr") & sapply(., is.numeric))
jcr_fourth
1 6
2 7
3 8
4 9
5 10
6 11
dplyr::select_if can use colnames and their values at the same time?
A workaround that is not too complicated is:
d %>% select_if(stringr::str_detect(names(.), "Petal") | sapply(., mean) > 5)
# or
d %>% select_if(grepl("Petal",names(.)) | sapply(., mean) > 5)
Which gives:
# A tibble: 150 x 3
Sepal.Length Petal.Length Petal.Width
<dbl> <dbl> <dbl>
1 5.1 1.4 0.2
2 4.9 1.4 0.2
3 4.7 1.3 0.2
4 4.6 1.5 0.2
5 5.0 1.4 0.2
6 5.4 1.7 0.4
7 4.6 1.4 0.3
8 5.0 1.5 0.2
9 4.4 1.4 0.2
10 4.9 1.5 0.1
# ... with 140 more rows
Related Topics
How to Fix Outofmemoryerror (Java): Gc Overhead Limit Exceeded in R
How to Specify Command Line Parameters to R-Script in Rstudio
How to Manually Create a Dendrogram (Or "Hclust") Object? (In R)
R How to Read a File from Google Drive Using R
R - Run Source() in Background
R Table Function: How to Sum Instead of Counting
How to Screenshot a Website Using R
Adding Total/Subtotal to the Bottom of a Datatable in Shiny
How to Read CSV Data with Unknown Encoding in R
How to Draw Two Half Circles in Ggplot in R
Extracting Noun+Noun or (Adj|Noun)+Noun from Text
Sum of Antidiagonal of a Matrix
Generate Matrix with Iid Normal Random Variables Using R
What Is the Correct Way to Ask for User Input in an R Program