R Dpylr Select_If with Multiple Conditions

R dpylr select_if with multiple conditions

If we have a data frame, x:

x = data.frame(V1=c(1,2,3),V2=c(10,11,12),V3=c('a','b','c'),V4=c('x','y','z'),V5=c('l', 'm','n'), stringsAsFactors=FALSE)
## V1 V2 V3 V4 V5
##1 1 10 a x l
##2 2 11 b y m
##3 3 12 c z n

where V1 and V2 are actually numeric and the rest of the columns are not factors, then we can do:

library(dplyr)
y <- x %>% select_if(function(col) is.numeric(col) |
all(col == .$V4) |
all(col == .$V5))
## V1 V2 V4 V5
##1 1 10 x l
##2 2 11 y m
##3 3 12 z n

Not saying that this is the best thing to do, but it does do what you want. The issue here is that select_if expects its function to return a boolean vector corresponding to all columns.

Another way is to use select:

y <- x %>% select(which(sapply(.,class)=="numeric"),V4,V5)
## V1 V2 V4 V5
##1 1 10 x l
##2 2 11 y m
##3 3 12 z n

which is probably better.

Select columns based on multiple attribute conditions

Elegant tidyverse syntax where ~ stands for anonymous function may be helpful when using select_if function:

require(tidyverse)

# numeric and character columns
starwars %>% select_if(~ is.numeric(.) | is.character(.))

# all numeric AND the name column
starwars %>% select(name, where(is.numeric))

Predicate functions e.g. is.numeric inside of select for some reason is recommended to be wrapped in where() according to tidyverse creators.

Usings multiple conditions in select (dplyr)

You can use select_if and grepl

library(dplyr)

df %>%
select_if(grepl("col", names(.)) & grepl(1, names(.)))

# column1
#1 1
#2 2
#3 3
#4 4
#5 5
#6 6
#7 7
#8 8
#9 9
#10 10

If you want to use select with contains you could do something like this:

df %>% 
select(intersect(contains("col"), contains("1")))

This can be combined in other ways, as mentioned in the comments:

df %>% 
select(intersect(contains("1"), starts_with("c")))

How to select columns depending on multiple conditions in dplyr

Inside where, we need to supply functions that have logical results.

library(dplyr)

select(df1, \(x) all(x < 5))

# or this, which might be more semantically correct
select(df1, where(\(x) is.numeric(x) & all(x < 5)))

a1
1 1
2 0
3 3
4 0

Data

df1 <- structure(list(X = c("B", "C", "D", "E"), a1 = c(1, 0, 3, 0), 
a2 = c(235, 270, 100, 1), a3 = c(3, 1000, 900, 2)), class = "data.frame", row.names = c(NA,
-4L))

How to select columns in r by multiple condition (not using select_if)?

We can use select with where

library(dplyr) # // version 1.0.4
library(ggplot2movies)
out2 <- movies %>%
select(where(~ mean(is.na(.)) > .25))

-checking with OP's code

out1 <- movies %>%
select_if(~ sum(is.na(.))/length(.) > .25)

identical(out1, out2)
#[1] TRUE

sum(...)/n() can be mean(...)

dplyr: conditional column selection using select_if()

The simplest and clearest way to do this is to pipe together 2 select functions:

iris %>%
select_if(is.numeric) %>% # Select all numeric columns
select(-contains('Width')) %>% # Then drop 'Width' column(s)
head

Sepal.Length Petal.Length
1 5.1 1.4
2 4.9 1.4
3 4.7 1.3
4 4.6 1.5
5 5.0 1.4
6 5.4 1.7

This works even inside a map function:

iris %>% 
group_by(Species) %>%
nest %>%
mutate(data = map(data, ~ .x %>%
select_if(is.numeric) %>%
select(-contains('Width')) %>%
mutate(count = sum(rowSums(.))))) %>%
mutate(data = map(data, ~ .x %>%
select_if(is.numeric) %>%
select(-contains('Width')) %>%
mutate_all(funs((. / count) * 100 )))) %>%

unnest

# A tibble: 150 x 4
Species Sepal.Length Petal.Length count
<fct> <dbl> <dbl> <dbl>
1 setosa 1.58 0.433 100
2 setosa 1.52 0.433 100
3 setosa 1.45 0.402 100
4 setosa 1.42 0.464 100
5 setosa 1.55 0.433 100
6 setosa 1.67 0.526 100
7 setosa 1.42 0.433 100
8 setosa 1.55 0.464 100
9 setosa 1.36 0.433 100
10 setosa 1.52 0.464 100
# ... with 140 more rows

Dplyr _if verbs with predicate function referring to the column names & multiple conditions?

You can do:

btest %>%
select_if(str_detect(names(.), "jcr") & sapply(., is.numeric))

jcr_fourth
1 6
2 7
3 8
4 9
5 10
6 11

dplyr::select_if can use colnames and their values at the same time?

A workaround that is not too complicated is:

d %>% select_if(stringr::str_detect(names(.), "Petal") | sapply(., mean) > 5)

# or
d %>% select_if(grepl("Petal",names(.)) | sapply(., mean) > 5)

Which gives:

# A tibble: 150 x 3
Sepal.Length Petal.Length Petal.Width
<dbl> <dbl> <dbl>
1 5.1 1.4 0.2
2 4.9 1.4 0.2
3 4.7 1.3 0.2
4 4.6 1.5 0.2
5 5.0 1.4 0.2
6 5.4 1.7 0.4
7 4.6 1.4 0.3
8 5.0 1.5 0.2
9 4.4 1.4 0.2
10 4.9 1.5 0.1
# ... with 140 more rows


Related Topics



Leave a reply



Submit