How to Apply Dplyr's Select(,Starts_With()) on Rows, Not Columns

How can I apply dplyr's select(,starts_with()) on rows, not columns?

I believe that the combination of dplyr's filter and the substring command are the most efficient:

library(dplyr)
filtered_df <- school %>% dplyr::filter(substr(Name,1,1) == "J")

select columns that do NOT start with a string using dplyr in R

We can use - as the starts_with output is not a logical vector

library(dplyr)
data %>%
select(ends_with("r"), -starts_with("hc"))
# lw_1r lw_3r
#1 1 2
#2 2 3

data

data <- structure(list(name = c("Joe", "Barb"), hc_1 = c(1L, 5L), hc_2 = c(2L, 
4L), hc_3r = c(3L, 3L), hc_4r = 2:3, lw_1r = 1:2, lw_2 = c(5L,
3L), lw_3r = 2:3, lw_4 = 2:1), class = "data.frame", row.names = c(NA,
-2L))

How NOT to select columns using select() dplyr when you have character vector of colnames?

Edit: OP's actual question was about how to use a character vector to select or deselect columns from a dataframe. Use the one_of() helper function for that:

colnames(iris)

# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"

cols <- c("Petal.Length", "Sepal.Length")

select(iris, one_of(cols)) %>% colnames

# [1] "Petal.Length" "Sepal.Length"

select(iris, -one_of(cols)) %>% colnames

# [1] "Sepal.Width" "Petal.Width" "Species"

You should have a look at the select helpers (type ?select_helpers) because they're incredibly useful. From the docs:

starts_with(): starts with a prefix

ends_with(): ends with a prefix

contains(): contains a literal string

matches(): matches a regular expression

num_range(): a numerical range like x01, x02, x03.

one_of(): variables in character vector.

everything(): all variables.


Given a dataframe with columns names a:z, use select like this:

select(-a, -b, -c, -d, -e)

# OR

select(-c(a, b, c, d, e))

# OR

select(-(a:e))

# OR if you want to keep b

select(-a, -(c:e))

# OR a different way to keep b, by just putting it back in

select(-(a:e), b)

So if I wanted to omit two of the columns from the iris dataset, I could say:

colnames(iris)

# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"

select(iris, -c(Sepal.Length, Petal.Length)) %>% colnames()

# [1] "Sepal.Width" "Petal.Width" "Species"

But of course, the best and most concise way to achieve that is using one of select's helper functions:

select(iris, -ends_with(".Length")) %>% colnames()

# [1] "Sepal.Width" "Petal.Width" "Species"

P.S. It's weird that you are passing quoted values to dplyr, one of its big niceties is that you don't have to keep typing out quotes all the time. As you can see, bare values work fine with dplyr and ggplot2.

Filter_at() not working with -starts_with()

That means your condition in all_vars is not met in columns that do not start with "A". That filter is searching all columns that don't start with A and only selecting rows that contain all 0's.

For example, mtcars dataset will not return anything with this condition:

mtcars %>%
filter_at(vars(-starts_with("q")), all_vars(. == 0))

[1] mpg cyl disp hp drat wt qsec vs am gear carb
<0 rows> (or 0-length row.names)

Unless, we add a row with only 0's (although we could have a non-zero for the qsec column):

mtcars %>%
bind_rows(setNames(rep(0, ncol(.)), names(.))) %>%
filter_at(vars(-starts_with("q")), all_vars(. == 0))

mpg cyl disp hp drat wt qsec vs am gear carb
1 0 0 0 0 0 0 0 0 0 0 0

EDIT: for your specific problem, it is because the column Description does not == 0. There are probably a couple solutions, but here are two below that should work for you!

df1 %>%
filter_at(vars(-starts_with("B"), -one_of("Description")), all_vars(. == 0))

df1 %>%
filter_if(sapply(., is.numeric) & !startsWith(names(.), "B"), all_vars(. == 0))

dplyr::starts_with and ends_with not subsetting based on arguments

A better option would be matches to match a regex pattern in the column name. Here, it matches the pattern 'ing' at the beginning (^) of the column name and numbers at the end ($) of the column name

sf_df %>% 
select(matches('^inq.*(7|8|10|13|14|15)$'))
# A tibble: 10 x 12
# inq1_7 inq1_8 inq1_10 inq1_13 inq1_14 inq1_15 inqfinal_7 inqfinal_8 inqfinal_10 inqfinal_13 inqfinal_14 inqfinal_15
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 NA NA NA NA NA NA 4 2 4 4 2 2
# 2 2 1 3 3 3 2 NA NA NA NA NA NA
# 3 4 NA 5 4 2 2 3 3 3 3 2 2
# 4 6 2 7 6 4 3 NA NA NA NA NA NA
# 5 7 7 4 4 4 5 NA NA NA NA NA NA
# 6 3 2 4 3 2 2 NA NA NA NA NA NA
# 7 1 1 2 4 1 4 NA NA NA NA NA NA
# 8 7 4 7 4 4 4 NA NA NA NA NA NA
# 9 NA NA NA NA NA NA NA NA NA NA NA NA
#10 NA NA NA NA NA NA NA NA NA NA NA NA

Note that by using both starts_with and ends_with, the desired result may not be the expected one. The OP's dataset has 30 columns where all the column names start with 'inq'. So, with starts_with, it returns all columns, and adding ends_with, it is checking an OR match, e.g.

sf_df %>% 
select(starts_with("inq"), ends_with("5")) %>%
ncol
#[1] 30 # returns 30 columns

It is not removing the columns that have no match for 5 at the string

It is not a behavior of the order of arguments as

sf_df %>%
select(ends_with("5"), starts_with("inq")) %>%
ncol
#[1] 30

Now, if we use only ends_with

sf_df %>% 
select(ends_with("5")) %>%
ncol
#[1] 4

Based on the example, all columns starts with 'inq', so, ends_with alone would be sufficient for a single string match as the documentation for ?ends_with specifies

match - A string.

and not multiple strings

where the Usage is

starts_with(match, ignore.case = TRUE, vars = peek_vars())

Using negated ends_with together with starts_with when selecting in dplyr

Use setdiff :

df %>% 
select(setdiff(starts_with("b"), ends_with("oo")))

# bar
# 1 0.5248344
# 2 0.8835366
# 3 0.3486265
# 4 0.6382468
# 5 0.7378287
# 6 0.2878244
# 7 0.1927559
# 8 0.9787019
# 9 0.5393251
# 10 0.9229542

The negative notation is magic understood by select, it's not understood by intersect.

row-wise operations, select helpers and the mutate function in dplyr

Fortunately, since dplyr > 1.0.0 there is a dplyr-way to do exactly what you were looking for by using c_across. This is helpful because it extends the solution to other functions that may have a Row implementation like RowMeans().

Try this:

my_df %>%
mutate(
a_2 = a^2,
b_2 = b^2,
) %>%
rowwise() %>%
mutate( mean = mean(c_across(ends_with("2"))) )

Is there a way to filter out rows if the first value in the rows meets a certain criteria. R

We can use a substring pattern match with grepl to return a logical vector for subseting the rows

df2 <- subset(df1, grepl('^W', Final)) 

Or using filter

library(dplyr)
library(stringr)
df2 <- df1 %>%
filter(str_detect(Final, '^W'))

Select columns based on string match - dplyr::select

Within the dplyr world, try:

select(iris,contains("Sepal"))

See the Selection section in ?select for numerous other helpers like starts_with, ends_with, etc.



Related Topics



Leave a reply



Submit