How can I apply dplyr's select(,starts_with()) on rows, not columns?
I believe that the combination of dplyr's filter and the substring command are the most efficient:
library(dplyr)
filtered_df <- school %>% dplyr::filter(substr(Name,1,1) == "J")
select columns that do NOT start with a string using dplyr in R
We can use -
as the starts_with
output is not a logical vector
library(dplyr)
data %>%
select(ends_with("r"), -starts_with("hc"))
# lw_1r lw_3r
#1 1 2
#2 2 3
data
data <- structure(list(name = c("Joe", "Barb"), hc_1 = c(1L, 5L), hc_2 = c(2L,
4L), hc_3r = c(3L, 3L), hc_4r = 2:3, lw_1r = 1:2, lw_2 = c(5L,
3L), lw_3r = 2:3, lw_4 = 2:1), class = "data.frame", row.names = c(NA,
-2L))
How NOT to select columns using select() dplyr when you have character vector of colnames?
Edit: OP's actual question was about how to use a character vector to select or deselect columns from a dataframe. Use the one_of()
helper function for that:
colnames(iris)
# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
cols <- c("Petal.Length", "Sepal.Length")
select(iris, one_of(cols)) %>% colnames
# [1] "Petal.Length" "Sepal.Length"
select(iris, -one_of(cols)) %>% colnames
# [1] "Sepal.Width" "Petal.Width" "Species"
You should have a look at the select helpers (type ?select_helpers
) because they're incredibly useful. From the docs:
starts_with()
: starts with a prefix
ends_with()
: ends with a prefix
contains()
: contains a literal string
matches()
: matches a regular expression
num_range()
: a numerical range like x01, x02, x03.
one_of()
: variables in character vector.
everything()
: all variables.
Given a dataframe with columns names a:z, use select
like this:
select(-a, -b, -c, -d, -e)
# OR
select(-c(a, b, c, d, e))
# OR
select(-(a:e))
# OR if you want to keep b
select(-a, -(c:e))
# OR a different way to keep b, by just putting it back in
select(-(a:e), b)
So if I wanted to omit two of the columns from the iris
dataset, I could say:
colnames(iris)
# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
select(iris, -c(Sepal.Length, Petal.Length)) %>% colnames()
# [1] "Sepal.Width" "Petal.Width" "Species"
But of course, the best and most concise way to achieve that is using one of select
's helper functions:
select(iris, -ends_with(".Length")) %>% colnames()
# [1] "Sepal.Width" "Petal.Width" "Species"
P.S. It's weird that you are passing quoted values to dplyr
, one of its big niceties is that you don't have to keep typing out quotes all the time. As you can see, bare values work fine with dplyr
and ggplot2
.
Filter_at() not working with -starts_with()
That means your condition in all_vars
is not met in columns that do not start with "A"
. That filter is searching all columns that don't start with A and only selecting rows that contain all 0
's.
For example, mtcars
dataset will not return anything with this condition:
mtcars %>%
filter_at(vars(-starts_with("q")), all_vars(. == 0))
[1] mpg cyl disp hp drat wt qsec vs am gear carb
<0 rows> (or 0-length row.names)
Unless, we add a row with only 0
's (although we could have a non-zero for the qsec
column):
mtcars %>%
bind_rows(setNames(rep(0, ncol(.)), names(.))) %>%
filter_at(vars(-starts_with("q")), all_vars(. == 0))
mpg cyl disp hp drat wt qsec vs am gear carb
1 0 0 0 0 0 0 0 0 0 0 0
EDIT: for your specific problem, it is because the column Description
does not == 0
. There are probably a couple solutions, but here are two below that should work for you!
df1 %>%
filter_at(vars(-starts_with("B"), -one_of("Description")), all_vars(. == 0))
df1 %>%
filter_if(sapply(., is.numeric) & !startsWith(names(.), "B"), all_vars(. == 0))
dplyr::starts_with and ends_with not subsetting based on arguments
A better option would be matches
to match a regex pattern in the column name. Here, it matches the pattern 'ing' at the beginning (^
) of the column name and numbers at the end ($
) of the column name
sf_df %>%
select(matches('^inq.*(7|8|10|13|14|15)$'))
# A tibble: 10 x 12
# inq1_7 inq1_8 inq1_10 inq1_13 inq1_14 inq1_15 inqfinal_7 inqfinal_8 inqfinal_10 inqfinal_13 inqfinal_14 inqfinal_15
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 NA NA NA NA NA NA 4 2 4 4 2 2
# 2 2 1 3 3 3 2 NA NA NA NA NA NA
# 3 4 NA 5 4 2 2 3 3 3 3 2 2
# 4 6 2 7 6 4 3 NA NA NA NA NA NA
# 5 7 7 4 4 4 5 NA NA NA NA NA NA
# 6 3 2 4 3 2 2 NA NA NA NA NA NA
# 7 1 1 2 4 1 4 NA NA NA NA NA NA
# 8 7 4 7 4 4 4 NA NA NA NA NA NA
# 9 NA NA NA NA NA NA NA NA NA NA NA NA
#10 NA NA NA NA NA NA NA NA NA NA NA NA
Note that by using both starts_with
and ends_with
, the desired result may not be the expected one. The OP's dataset has 30 columns where all the column names start with 'inq'. So, with starts_with
, it returns all columns, and adding ends_with
, it is checking an OR
match, e.g.
sf_df %>%
select(starts_with("inq"), ends_with("5")) %>%
ncol
#[1] 30 # returns 30 columns
It is not removing the columns that have no match for 5 at the string
It is not a behavior of the order of arguments as
sf_df %>%
select(ends_with("5"), starts_with("inq")) %>%
ncol
#[1] 30
Now, if we use only ends_with
sf_df %>%
select(ends_with("5")) %>%
ncol
#[1] 4
Based on the example, all columns starts with 'inq', so, ends_with
alone would be sufficient for a single string match
as the documentation for ?ends_with
specifies
match - A string.
and not multiple strings
where the Usage is
starts_with(match, ignore.case = TRUE, vars = peek_vars())
Using negated ends_with together with starts_with when selecting in dplyr
Use setdiff
:
df %>%
select(setdiff(starts_with("b"), ends_with("oo")))
# bar
# 1 0.5248344
# 2 0.8835366
# 3 0.3486265
# 4 0.6382468
# 5 0.7378287
# 6 0.2878244
# 7 0.1927559
# 8 0.9787019
# 9 0.5393251
# 10 0.9229542
The negative notation is magic understood by select
, it's not understood by intersect
.
row-wise operations, select helpers and the mutate function in dplyr
Fortunately, since dplyr > 1.0.0 there is a dplyr-way to do exactly what you were looking for by using c_across. This is helpful because it extends the solution to other functions that may have a Row implementation like RowMeans().
Try this:
my_df %>%
mutate(
a_2 = a^2,
b_2 = b^2,
) %>%
rowwise() %>%
mutate( mean = mean(c_across(ends_with("2"))) )
Is there a way to filter out rows if the first value in the rows meets a certain criteria. R
We can use a substring pattern match with grepl
to return a logical vector for subset
ing the rows
df2 <- subset(df1, grepl('^W', Final))
Or using filter
library(dplyr)
library(stringr)
df2 <- df1 %>%
filter(str_detect(Final, '^W'))
Select columns based on string match - dplyr::select
Within the dplyr world, try:
select(iris,contains("Sepal"))
See the Selection section in ?select
for numerous other helpers like starts_with
, ends_with
, etc.
Related Topics
Using Grep to Subset Rows from a Data.Table, Comparing Row Content
Overlapping the Predicted Time Series on the Original Series in R
Is There a Limit for the Possible Number of Nested Ifelse Statements
How to Print a Variable Inside a for Loop to the Console in Real Time as the Loop Is Running
Display Duplicate Records in Data.Frame and Omit Single Ones
Rolling Join Grouped by a Second Variable in Data.Table
Create an Arrow with Gradient Color
Addsma Not Drawn on Graph When Called from Function
Store Arrangegrob to Object, Does Not Create Printable Object
Match Two Columns with Two Other Columns
Dygraph in R Multiple Plots at Once
Group Vector on Conditional Sum
Write.Csv() a List of Unequally Sized Data.Frames
Plot Scatterplot on a Map in Shiny
How to Optimize for Integer Parameters (And Other Discontinuous Parameter Space) in R