How to select_if in dplyr, where the logical condition is negated
Negating a predicate function can be done with the dedicated Negate()
or purrr::negate()
functions (rather than the !
operator, that negates a vector):
library(dplyr)
mtcars %>%
mutate(foo = "bar") %>%
select_if(Negate(is.numeric)) %>%
head()
# foo
# 1 bar
# 2 bar
# 3 bar
# 4 bar
# 5 bar
# 6 bar
Or (purrr::negate()
(lower-case) has slightly different behavior, see the respective help pages):
library(purrr)
library(dplyr)
mtcars %>%
mutate(foo = "bar") %>%
select_if(negate(is.numeric)) %>%
head()
# foo
# 1 bar
# 2 bar
# 3 bar
# 4 bar
# 5 bar
# 6 bar
Use dplyr's _if() functions like mutate_if() with a negative predicate function
We can use shorthand notation ~
for anonymous function in tidyverse
library(dplyr)
iris %>%
mutate_if(~ !is.numeric(.), as.character)
Or without anonymous function, use negate
from purrr
library(purrr)
iris %>%
mutate_if(negate(is.numeric), as.character)
In addition to negate
, Negate
from base R
also works
iris %>%
mutate_if(Negate(is.numeric), as.character)
Same notation, works with select_if/arrange_if
iris %>%
select_if(negate(is.numeric))%>%
head(2)
# Species
#1 setosa
#2 setosa
R dpylr select_if with multiple conditions
If we have a data frame, x
:
x = data.frame(V1=c(1,2,3),V2=c(10,11,12),V3=c('a','b','c'),V4=c('x','y','z'),V5=c('l', 'm','n'), stringsAsFactors=FALSE)
## V1 V2 V3 V4 V5
##1 1 10 a x l
##2 2 11 b y m
##3 3 12 c z n
where V1
and V2
are actually numeric
and the rest of the columns are not factors, then we can do:
library(dplyr)
y <- x %>% select_if(function(col) is.numeric(col) |
all(col == .$V4) |
all(col == .$V5))
## V1 V2 V4 V5
##1 1 10 x l
##2 2 11 y m
##3 3 12 z n
Not saying that this is the best thing to do, but it does do what you want. The issue here is that select_if
expects its function to return a boolean vector corresponding to all columns.
Another way is to use select
:
y <- x %>% select(which(sapply(.,class)=="numeric"),V4,V5)
## V1 V2 V4 V5
##1 1 10 x l
##2 2 11 y m
##3 3 12 z n
which is probably better.
How to select non-numeric columns using dplyr::select_if
You can use purrr
's negate()
which is included if you use library(tidyverse)
rather than just library(dplyr)
library(tidyverse)
iris %>% select_if(negate(is.numeric))
dplyr select using logical
My answers would be:
- no ("Can select in
dplyr
be used with a logical vector?")
evidence: (1) your example, (2) the help page:
...: Comma separated list of unquoted expressions. You can treat
variable names like they are positions. Use positive values
to select variables; use negative values to drop variables.
Doesn't say anything about logical vectors. Sorry.
- I don't know ("why not a logical?") -- 'just because' (I don't think anyone but the developer could really answer this). You could put in a feature request ...
It's a little clunky, but
select_(dat,.dots=names(isNum)[isNum])
works (note that you need the select_
variant to allow using a character vector). But good old-fashioned
subset(dat,select=isNum)
seems to work fine too (unless it fails to play nicely with dplyr
in some other way I haven't thought of).
If you look at the code of dplyr:::starts_with
, you can see that it returns a vector of positions, not a logical vector
function (vars, match, ignore.case = TRUE)
{
stopifnot(is.string(match), !is.na(match), nchar(match) >
0)
if (ignore.case)
match <- tolower(match)
n <- nchar(match)
if (ignore.case)
vars <- tolower(vars)
which(substr(vars, 1, n) == match)
}
I was going to suggest that you try to modify this function to create an is_numeric
equivalent, but I don't understand the underlying magic sufficiently well ...
How to use select() only on columns of a certain type without loosing columns of other types?
Perhaps an option could be to create your own custom function, and use that as the predicate
in the select_if
function. Something like this:
check_cond <- function(x) is.character(x) | is.numeric(x) && sum(x) > 12
tibbly %>%
select_if(check_cond)
y z
<chr> <dbl>
1 a 9
2 b 8
3 c 7
4 d 6
How to select columns depending on multiple conditions in dplyr
Inside where
, we need to supply functions that have logical results.
library(dplyr)
select(df1, \(x) all(x < 5))
# or this, which might be more semantically correct
select(df1, where(\(x) is.numeric(x) & all(x < 5)))
a1
1 1
2 0
3 3
4 0
Data
df1 <- structure(list(X = c("B", "C", "D", "E"), a1 = c(1, 0, 3, 0),
a2 = c(235, 270, 100, 1), a3 = c(3, 1000, 900, 2)), class = "data.frame", row.names = c(NA,
-4L))
How to the operator NOT with is.numeric
is.numeric
is a function. !
, by contrast, negates a logical value. Applying !
to a function makes no sense. You need to call the function and negate its result.
In general you’d do the following:
select_if(function (x) ! is.numeric(x))
Or, using the lambda notation of tidyeval:
select_if(~ ! is.numeric(.x))
But R has a function factory to negate the result of a function:
select_if(Negate(is.numeric))
Negation `!` in a dplyr pipeline `%%`
You can use backticks
around !
df %>%
`!`
# a b
#[1,] FALSE FALSE
#[2,] TRUE FALSE
#[3,] TRUE FALSE
For !is.na
df$a[2] <- NA
df %>%
is.na %>%
`!`
# a b
#[1,] TRUE TRUE
#[2,] FALSE TRUE
#[3,] TRUE TRUE
Related Topics
Adding All Elements of Two Lists
Aggregating Values on a Data Tree with R
Pivot_Longer Multiple Variables of Different Kinds
R: Loop Over Columns in Data.Table
Saving a File to Sharepoint with R
Using Facet Tags and Strip Labels Together in Ggplot2
How to Use "Cast" in Reshape Without Aggregation
R Dataframe with Varied Column Lengths
R - Lattice Xyplot - How to Add Error Bars to Groups and Summary Lines
Rmarkdown Removes Citation Hyperlink
How to Get Covariance Matrix for Random Effects (Blups/Conditional Modes) from Lme4
Adding Annotation (Segment/Arrow) in Only Certain Facet Ggplot
How to Get a List of All Possible Partitions of a Vector in R
Is There an Alternative to "Revalue" Function from Plyr When Using Dplyr
Automatically Detect Date Columns When Reading a File into a Data.Frame