How to Select_If in Dplyr, Where the Logical Condition Is Negated

How to select_if in dplyr, where the logical condition is negated

Negating a predicate function can be done with the dedicated Negate() or purrr::negate() functions (rather than the ! operator, that negates a vector):

library(dplyr)

mtcars %>%
mutate(foo = "bar") %>%
select_if(Negate(is.numeric)) %>%
head()

# foo
# 1 bar
# 2 bar
# 3 bar
# 4 bar
# 5 bar
# 6 bar

Or (purrr::negate() (lower-case) has slightly different behavior, see the respective help pages):

library(purrr)
library(dplyr)

mtcars %>%
mutate(foo = "bar") %>%
select_if(negate(is.numeric)) %>%
head()

# foo
# 1 bar
# 2 bar
# 3 bar
# 4 bar
# 5 bar
# 6 bar

Use dplyr's _if() functions like mutate_if() with a negative predicate function

We can use shorthand notation ~ for anonymous function in tidyverse

library(dplyr)
iris %>%
mutate_if(~ !is.numeric(.), as.character)

Or without anonymous function, use negate from purrr

library(purrr)
iris %>%
mutate_if(negate(is.numeric), as.character)

In addition to negate, Negate from base R also works

iris %>%
mutate_if(Negate(is.numeric), as.character)

Same notation, works with select_if/arrange_if

iris %>%
select_if(negate(is.numeric))%>%
head(2)
# Species
#1 setosa
#2 setosa

R dpylr select_if with multiple conditions

If we have a data frame, x:

x = data.frame(V1=c(1,2,3),V2=c(10,11,12),V3=c('a','b','c'),V4=c('x','y','z'),V5=c('l', 'm','n'), stringsAsFactors=FALSE)
## V1 V2 V3 V4 V5
##1 1 10 a x l
##2 2 11 b y m
##3 3 12 c z n

where V1 and V2 are actually numeric and the rest of the columns are not factors, then we can do:

library(dplyr)
y <- x %>% select_if(function(col) is.numeric(col) |
all(col == .$V4) |
all(col == .$V5))
## V1 V2 V4 V5
##1 1 10 x l
##2 2 11 y m
##3 3 12 z n

Not saying that this is the best thing to do, but it does do what you want. The issue here is that select_if expects its function to return a boolean vector corresponding to all columns.

Another way is to use select:

y <- x %>% select(which(sapply(.,class)=="numeric"),V4,V5)
## V1 V2 V4 V5
##1 1 10 x l
##2 2 11 y m
##3 3 12 z n

which is probably better.

How to select non-numeric columns using dplyr::select_if

You can use purrr's negate() which is included if you use library(tidyverse) rather than just library(dplyr)

library(tidyverse)
iris %>% select_if(negate(is.numeric))

dplyr select using logical

My answers would be:

  • no ("Can select in dplyr be used with a logical vector?")

evidence: (1) your example, (2) the help page:

...: Comma separated list of unquoted expressions. You can treat
variable names like they are positions. Use positive values
to select variables; use negative values to drop variables.

Doesn't say anything about logical vectors. Sorry.

  • I don't know ("why not a logical?") -- 'just because' (I don't think anyone but the developer could really answer this). You could put in a feature request ...

It's a little clunky, but

select_(dat,.dots=names(isNum)[isNum])

works (note that you need the select_ variant to allow using a character vector). But good old-fashioned

subset(dat,select=isNum)

seems to work fine too (unless it fails to play nicely with dplyr in some other way I haven't thought of).

If you look at the code of dplyr:::starts_with, you can see that it returns a vector of positions, not a logical vector

function (vars, match, ignore.case = TRUE) 
{
stopifnot(is.string(match), !is.na(match), nchar(match) >
0)
if (ignore.case)
match <- tolower(match)
n <- nchar(match)
if (ignore.case)
vars <- tolower(vars)
which(substr(vars, 1, n) == match)
}

I was going to suggest that you try to modify this function to create an is_numeric equivalent, but I don't understand the underlying magic sufficiently well ...

How to use select() only on columns of a certain type without loosing columns of other types?

Perhaps an option could be to create your own custom function, and use that as the predicate in the select_if function. Something like this:

check_cond <- function(x) is.character(x) | is.numeric(x) && sum(x) > 12

tibbly %>%
select_if(check_cond)

y z
<chr> <dbl>
1 a 9
2 b 8
3 c 7
4 d 6

How to select columns depending on multiple conditions in dplyr

Inside where, we need to supply functions that have logical results.

library(dplyr)

select(df1, \(x) all(x < 5))

# or this, which might be more semantically correct
select(df1, where(\(x) is.numeric(x) & all(x < 5)))

a1
1 1
2 0
3 3
4 0

Data

df1 <- structure(list(X = c("B", "C", "D", "E"), a1 = c(1, 0, 3, 0), 
a2 = c(235, 270, 100, 1), a3 = c(3, 1000, 900, 2)), class = "data.frame", row.names = c(NA,
-4L))

How to the operator NOT with is.numeric

is.numeric is a function. !, by contrast, negates a logical value. Applying ! to a function makes no sense. You need to call the function and negate its result.

In general you’d do the following:

select_if(function (x) ! is.numeric(x))

Or, using the lambda notation of tidyeval:

select_if(~ ! is.numeric(.x))

But R has a function factory to negate the result of a function:

select_if(Negate(is.numeric))

Negation `!` in a dplyr pipeline `%%`

You can use backticks around !

 df %>%
`!`
# a b
#[1,] FALSE FALSE
#[2,] TRUE FALSE
#[3,] TRUE FALSE

For !is.na

 df$a[2] <- NA
df %>%
is.na %>%
`!`
# a b
#[1,] TRUE TRUE
#[2,] FALSE TRUE
#[3,] TRUE TRUE


Related Topics



Leave a reply



Submit