Pass a vector of variable names to arrange() in dplyr
Hadley hasn't made this obvious in the help file--only in his NSE vignette. The versions of the functions followed by underscores use standard evaluation, so you pass them vectors of strings and the like.
If I understand your problem correctly, you can just replace arrange()
with arrange_()
and it will work.
Specifically, pass the vector of strings as the .dots
argument when you do it.
> df %>% arrange_(.dots=c("var1","var3"))
var1 var2 var3 var4
1 1 i 5 i
2 1 x 7 w
3 1 h 8 e
4 2 b 5 f
5 2 t 5 b
6 2 w 7 h
7 3 s 6 d
8 3 f 8 e
9 4 c 5 y
10 4 o 8 c
========== Update March 2018 ==============
Using the standard evaluation versions in dplyr as I have shown here is now considered deprecated. You can read Hadley's programming vignette for the new way. Basically you will use !!
to unquote one variable or !!!
to unquote a vector of variables inside of arrange()
.
When you pass those columns, if they are bare, quote them using quo()
for one variable or quos()
for a vector. Don't use quotation marks. See the answer by Akrun.
If your columns are already strings, then make them names using rlang::sym()
for a single column or rlang::syms()
for a vector. See the answer by Christos. You can also use as.name()
for a single column. Unfortunately as of this writing, the information on how to use rlang::sym()
has not yet made it into the vignette I link to above (eventually it will be in the section on "variadic quasiquotation" according to his draft).
dplyr arrange() works with single variable inside c(), but not multiple variables inside of c() when evaluated inside of a function
Update 2022/03/17
The tidyverse has evolved and so should this answer.
There is no need for enquo
anymore! Instead we enclose tidy-select expressions in double braces {{ }}
.
library("tidyverse")
df <- tribble(
~var1, ~var2, ~var3,
1, 2, 3,
4, 5, 6,
7, 8, 9
)
fun <- function(data, select_vars, ...) {
data %>%
select(
{{ select_vars }}
) %>%
arrange(
...
)
}
fun(df, c(var1, var2), desc(var2))
#> # A tibble: 3 × 2
#> var1 var2
#> <dbl> <dbl>
#> 1 7 8
#> 2 4 5
#> 3 1 2
fun(df, c(var1, var2), var1, var2)
#> # A tibble: 3 × 2
#> var1 var2
#> <dbl> <dbl>
#> 1 1 2
#> 2 4 5
#> 3 7 8
We still can't use c()
with the arrange
and filter
verbs because that's not allowed with data-masking.
df %>%
arrange(
c(var1, var2)
)
#> Error in `arrange()`:
#> ! Problem with the implicit `transmute()` step.
#> x Problem while computing `..1 = c(var1, var2)`.
#> x `..1` must be size 3 or 1, not 6.
Created on 2022-03-17 by the reprex package (v2.0.1)
Old answer
Replacing arrange_var
with ...
and specifying the variables without enclosing them in c()
makes it work.
library("dplyr")
df <- tribble(
~var1, ~var2, ~var3,
1, 2, 3,
4, 5, 6,
7, 8, 9
)
fun <- function(data, select_var, ...) {
select_var <- enquo(select_var)
data %>%
select(!!select_var) %>%
# You can pass the dots to `arrange` directly
arrange(...)
}
fun(df, c(var1, var2), var2)
#> # A tibble: 3 x 2
#> var1 var2
#> <dbl> <dbl>
#> 1 1 2
#> 2 4 5
#> 3 7 8
fun(df, c(var1, var2), var1, var2)
#> # A tibble: 3 x 2
#> var1 var2
#> <dbl> <dbl>
#> 1 1 2
#> 2 4 5
#> 3 7 8
Created on 2019-03-08 by the reprex package (v0.2.1)
It turns out that only select
supports strings and character vectors. As the documentation says, "This is unlike other verbs where strings would be ambiguous." See the last example for dplyr::select
.
# Two alternatives; both work with `select`.
df %>%
select(var1, var2)
#> # A tibble: 3 x 2
#> var1 var2
#> <dbl> <dbl>
#> 1 1 2
#> 2 4 5
#> 3 7 8
df %>%
select(c(var1, var2))
#> # A tibble: 3 x 2
#> var1 var2
#> <dbl> <dbl>
#> 1 1 2
#> 2 4 5
#> 3 7 8
# `arrange` only works with lists on comma separated unquoted variable names.
df %>%
arrange(var1, var2)
#> # A tibble: 3 x 3
#> var1 var2 var3
#> <dbl> <dbl> <dbl>
#> 1 1 2 3
#> 2 4 5 6
#> 3 7 8 9
df %>%
arrange(c(var, var2))
#> Error: incorrect size (4) at position 1, expecting : 3
Created on 2019-03-08 by the reprex package (v0.2.1)
Why doesn't R dplyr arrange sort properly using a vector element within a for loop
This is "programming with dplyr", use .data
for referencing columns by a string:
toy %>%
select(a, tf, get_it[j]) %>%
group_by(a) %>%
arrange(desc(.data[[ get_it[j] ]]), .by_group=TRUE)
# # A tibble: 100 x 3
# # Groups: a [3]
# a tf n1
# <chr> <chr> <int>
# 1 a F 99
# 2 a F 98
# 3 a F 96
# 4 a F 95
# 5 a T 93
# 6 a T 92
# 7 a T 92
# 8 a T 90
# 9 a F 87
# 10 a F 86
# # ... with 90 more rows
Arrange by a dynamically specified column
Try get
dt %>% arrange(get(sort_by))
Pass a string as variable name in dplyr::filter
!!
or UQ
evaluates the variable, so mtcars %>% filter(!!var == 4)
is the same as mtcars %>% filter('cyl' == 4)
where the condition always evaluates to false; You can prove this by printing !!var
in the filter function:
mtcars %>% filter({ print(!!var); (!!var) == 4 })
# [1] "cyl"
# [1] mpg cyl disp hp drat wt qsec vs am gear carb
# <0 rows> (or 0-length row.names)
To evaluate var
to the cyl
column, you need to convert var
to a symbol of cyl
first, then evaluate the symbol cyl
to a column:
Using rlang
:
library(rlang)
var <- 'cyl'
mtcars %>% filter((!!sym(var)) == 4)
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#2 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#3 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# ...
Or use as.symbol/as.name
from baseR:
mtcars %>% filter((!!as.symbol(var)) == 4)
mtcars %>% filter((!!as.name(var)) == 4)
Dplyr standard evaluation using a vector of multiple strings with mutate function
There are several keys to solving this question:
- Accessing the strings within a character vector and using these with
dplyr
- The formatting of arguments provided to the function used with
mutate
, here theanyNA
The goal here is to replicate this call, but using the named variable two_names
instead of manually typing out c(jack,jill)
.
stackdf %>% rowwise %>% mutate(test = anyNA(c(jack,jill)))
# A tibble: 10 x 4
jack jill jane test
<dbl> <dbl> <dbl> <lgl>
1 1 1 1 FALSE
2 NA 2 2 TRUE
3 2 NA 3 TRUE
4 NA 3 4 TRUE
5 3 4 5 FALSE
6 NA NA 6 TRUE
7 4 5 NA FALSE
8 NA 6 NA TRUE
9 5 NA NA TRUE
10 NA 7 NA TRUE
1. Using dynamic variables with dplyr
Using
quo
/quos
: Does not accept strings as input. The solution using this method would be:two_names2 <- quos(c(jack, jill))
stackdf %>% rowwise %>% mutate(test = anyNA(!!! two_names2))Note that
quo
takes a single argument, and thus is unquoted using!!
, and for multiple arguments you can usequos
and!!!
respectively. This is not desirable because I do not usetwo_names
and instead have to type out the columns I wish to use.Using
as.name
orrlang::sym
/rlang::syms
:as.name
andsym
take only a single input, howeversyms
will take multiple and return a list of symbolic objects as output.> two_names
[1] "jack" "jill"
> as.name(two_names)
jack
> syms(two_names)
[[1]]
jack
[[2]]
jillNote that
as.name
ignores everything after the first element. However,syms
appears to work appropriately here, so now we need to use this within themutate
call.
2. Using dynamic variables within mutate
using anyNA
or other variables
Using
syms
andanyNA
directly does not actually produce the correct result.> stackdf %>% rowwise %>% mutate(test = anyNA(!!! syms(two_names)))
jack jill jane test
<dbl> <dbl> <dbl> <lgl>
1 1 1 1 FALSE
2 NA 2 2 TRUE
3 2 NA 3 FALSE
4 NA 3 4 TRUE
5 3 4 5 FALSE
6 NA NA 6 TRUE
7 4 5 NA FALSE
8 NA 6 NA TRUE
9 5 NA NA FALSE
10 NA 7 NA TRUEInspection of the
test
shows that this is only taking into account the first element, and ignoring the second element. However, if I use a different function, egsum
orpaste0
, it is clear that both elements are being used:> stackdf %>% rowwise %>% mutate(test = sum(!!! syms(two_names),
na.rm = TRUE))
jack jill jane test
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 2
2 NA 2 2 2
3 2 NA 3 2
4 NA 3 4 3
5 3 4 5 7
6 NA NA 6 0
7 4 5 NA 9
8 NA 6 NA 6
9 5 NA NA 5
10 NA 7 NA 7The reason for this becomes clear when you look at the arguments for
anyNA
vssum
.function (x, recursive = FALSE) .Primitive("anyNA")
function (..., na.rm = FALSE) .Primitive("sum")
anyNA
expects a single objectx
, whereassum
can take a variable list of objects(...)
.Simply supplying
c()
fixes this problem (see answer from alistaire).> stackdf %>% rowwise %>% mutate(test = anyNA(c(!!! syms(two_names))))
jack jill jane test
<dbl> <dbl> <dbl> <lgl>
1 1 1 1 FALSE
2 NA 2 2 TRUE
3 2 NA 3 TRUE
4 NA 3 4 TRUE
5 3 4 5 FALSE
6 NA NA 6 TRUE
7 4 5 NA FALSE
8 NA 6 NA TRUE
9 5 NA NA TRUE
10 NA 7 NA TRUEAlternately... for educational purposes, one could use a combination of
sapply
,any
, andanyNA
to produce the correct result. Here we uselist
so that the results are provided as a single list object.# this produces an error an error because the elements of !!!
# are being passed to the arguments of sapply (X =, FUN = )
> stackdf %>% rowwise %>%
mutate(test = any(sapply(!!! syms(two_names), anyNA)))
Error in mutate_impl(.data, dots) :
Evaluation error: object 'jill' of mode 'function' was not found.Supplying
list
fixes this problem because it binds all the results into a single object.# the below table is the familiar incorrect result that uses only the `jack`
> stackdf %>% rowwise %>%
mutate(test = any(sapply(X=as.list(!!! syms(two_names)),
FUN=anyNA)))
jack jill jane test
<dbl> <dbl> <dbl> <lgl>
1 1 1 1 FALSE
2 NA 2 2 TRUE
3 2 NA 3 FALSE
4 NA 3 4 TRUE
5 3 4 5 FALSE
6 NA NA 6 TRUE
7 4 5 NA FALSE
8 NA 6 NA TRUE
9 5 NA NA FALSE
10 NA 7 NA TRUE
# this produces the correct answer
> stackdf %>% rowwise %>%
mutate(test = any(X = sapply(list(!!! syms(two_names)),
FUN = anyNA)))
jack jill jane test
<dbl> <dbl> <dbl> <lgl>
1 1 1 1 FALSE
2 NA 2 2 TRUE
3 2 NA 3 TRUE
4 NA 3 4 TRUE
5 3 4 5 FALSE
6 NA NA 6 TRUE
7 4 5 NA FALSE
8 NA 6 NA TRUE
9 5 NA NA TRUE
10 NA 7 NA TRUEUnderstanding why these two perform differently make sense when their behavior is compared!
> as.list(two_names)
[[1]]
[1] "jack"
[[2]]
[1] "jill"
> list(two_names)
[[1]]
[1] "jack" "jill"
Dynamically sorting columns in dplyr via passing ordered vector with column names to select
You're definitely on the right path.
mt_sum <- mtcars %>%
group_by(am) %>%
summarise_each(funs(min, mean, median, max), mpg, cyl) %>%
mutate(am = as.character(am)) %>%
left_join(y = as.data.frame(table(mtcars$am),
stringsAsFactors = FALSE),
by = c("am" = "Var1")) %>%
.[, names(.)[order(names(.))]]
Parsing string as column name in dplyr
I would use a named vector instead of trying to mess around with the dplyr programming nuances. A benefit is that this method is already vectorized.
rename_cols <- function(col) {
name = paste0(col, "_new") #I want to be able to parse this into the rename function below
mtcars %>%
rename(setNames(col, name))
}
rename_cols(colnames(mtcars))
# mpg_new cyl_new disp_new hp_new drat_new wt_new qsec_new vs_new am_new gear_new carb_new
# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
# ...
Edit
In this case, you might also find rename_with()
to be what you need.
library(dplyr)
colnames(mtcars) -> cols
mtcars %>%
rename_with(~ paste0(., "_new"), any_of(cols))
# which is the same as the more concise but maybe less clear...
mtcars %>%
rename_with(paste0, any_of(cols), "_new")
Related Topics
Using Rcpp Within Parallel Code via Snow to Make a Cluster
How to Display All X Labels in R Barplot
Finding Point of Intersection in R
Passing Command Line Arguments to R Cmd Batch
Convert Named Character Vector to Data.Frame
Rolling Join on Data.Table with Duplicate Keys
Changing Fonts for Graphs in R
Why Is As.Date Slow on a Character Vector
Add Nas to Make All List Elements Equal Length
Subtract a Column in a Dataframe from Many Columns in R
Re-Ordering Factor Levels in Data Frame
Issue with Geom_Text When Using Position_Dodge
How to Obtain an 'Unbalanced' Grid of Ggplots
How to Spread Columns with Duplicate Identifiers
R - Add Column That Counts Sequentially Within Groups But Repeats for Duplicates
In R Markdown in Rstudio, How to Prevent the Source Code from Running Off a PDF Page