How to Select Non-Numeric Columns Using Dplyr::Select_If

How to select non-numeric columns using dplyr::select_if

You can use purrr's negate() which is included if you use library(tidyverse) rather than just library(dplyr)

library(tidyverse)
iris %>% select_if(negate(is.numeric))

Extract all columns except numeric in R data frame

purrr package from tidyverse serves exactly what you want by purrr::keep and purrr::discard

library(purrr)

x <- iris %>% keep(is.numeric)

by these piece of code, you set a logical test in keep function and only the columns which passed the test stays.

to reverse that operation and achieve to your wish, you can use discard from purrr also;

x <- iris %>% discard(is.numeric)

you can think discard as keep but with !is.numeric

or alternatively by dplyr

x <- iris %>% select_if(~!is.numeric(.))

How to select_if in dplyr, where the logical condition is negated

Negating a predicate function can be done with the dedicated Negate() or purrr::negate() functions (rather than the ! operator, that negates a vector):

library(dplyr)

mtcars %>%
mutate(foo = "bar") %>%
select_if(Negate(is.numeric)) %>%
head()

# foo
# 1 bar
# 2 bar
# 3 bar
# 4 bar
# 5 bar
# 6 bar

Or (purrr::negate() (lower-case) has slightly different behavior, see the respective help pages):

library(purrr)
library(dplyr)

mtcars %>%
mutate(foo = "bar") %>%
select_if(negate(is.numeric)) %>%
head()

# foo
# 1 bar
# 2 bar
# 3 bar
# 4 bar
# 5 bar
# 6 bar

Selecting only numeric columns from a data frame

EDIT: updated to avoid use of ill-advised sapply.

Since a data frame is a list we can use the list-apply functions:

nums <- unlist(lapply(x, is.numeric), use.names = FALSE)  

Then standard subsetting

x[ , nums]

## don't use sapply, even though it's less code
## nums <- sapply(x, is.numeric)

For a more idiomatic modern R I'd now recommend

x[ , purrr::map_lgl(x, is.numeric)]

Less codey, less reflecting R's particular quirks, and more straightforward, and robust to use on database-back-ended tibbles:

dplyr::select_if(x, is.numeric)

Newer versions of dplyr, also support the following syntax:

x %>% dplyr::select(where(is.numeric))

How to exclude non-numeric columns in dplyr statement

You can use dplyr's select_if function:

df %>% select_if(is.numeric)

or as Mislav suggested in comments, go straight to a summary using summarise_if.

df %>% 
group_by(Pop_Size_Group) %>%
summarise_if(is.numeric, mean, na.rm = TRUE)

How do I remove all integer columns from a dataframe in R with dplyr?

Use the select_if

out <- mydata %>%
select_if(Negate(is.integer))
str(out)
#'data.frame': 50 obs. of 2 variables:
# $ Murder: num 13.2 10 8.1 8.8 9 7.9 3.3 5.9 15.4 17.4 ...
# $ Rape : num 21.2 44.5 31 19.5 40.6 38.7 11.1 15.8 31.9 25.8 ...

If we want to select more than one type, then use

mydata %>% 
select_if(~ !(is.integer(.x)) | is.numeric(.x))

Filtering in dplyr based on two non-numeric values

Try changing Variable == "System" & "System (U.S.)" for Variable == "System" | Variable == "System (U.S.)". That should work.

How to use select() only on columns of a certain type without loosing columns of other types?

Perhaps an option could be to create your own custom function, and use that as the predicate in the select_if function. Something like this:

check_cond <- function(x) is.character(x) | is.numeric(x) && sum(x) > 12

tibbly %>%
select_if(check_cond)

y z
<chr> <dbl>
1 a 9
2 b 8
3 c 7
4 d 6


Related Topics



Leave a reply



Submit