Pass a Vector of Variable Names to Arrange() in Dplyr

Pass a vector of variable names to arrange() in dplyr

Hadley hasn't made this obvious in the help file--only in his NSE vignette. The versions of the functions followed by underscores use standard evaluation, so you pass them vectors of strings and the like.

If I understand your problem correctly, you can just replace arrange() with arrange_() and it will work.

Specifically, pass the vector of strings as the .dots argument when you do it.

> df %>% arrange_(.dots=c("var1","var3"))
   var1 var2 var3 var4
1     1    i    5    i
2     1    x    7    w
3     1    h    8    e
4     2    b    5    f
5     2    t    5    b
6     2    w    7    h
7     3    s    6    d
8     3    f    8    e
9     4    c    5    y
10    4    o    8    c

========== Update March 2018 ==============

Using the standard evaluation versions in dplyr as I have shown here is now considered deprecated. You can read Hadley's programming vignette for the new way. Basically you will use !! to unquote one variable or !!! to unquote a vector of variables inside of arrange().

When you pass those columns, if they are bare, quote them using quo() for one variable or quos() for a vector. Don't use quotation marks. See the answer by Akrun.

If your columns are already strings, then make them names using rlang::sym() for a single column or rlang::syms() for a vector. See the answer by Christos. You can also use as.name() for a single column. Unfortunately as of this writing, the information on how to use rlang::sym() has not yet made it into the vignette I link to above (eventually it will be in the section on "variadic quasiquotation" according to his draft).

dplyr arrange() works with single variable inside c(), but not multiple variables inside of c() when evaluated inside of a function

Update 2022/03/17

The tidyverse has evolved and so should this answer.

There is no need for enquo anymore! Instead we enclose tidy-select expressions in double braces {{ }}.

library("tidyverse")

df <- tribble(
  ~var1, ~var2, ~var3,
  1, 2, 3,
  4, 5, 6,
  7, 8, 9
)

fun <- function(data, select_vars, ...) {
  data %>%
    select(
      {{ select_vars }}
    ) %>%
    arrange(
      ...
    )
}


fun(df, c(var1, var2), desc(var2))
#> # A tibble: 3 × 2
#>    var1  var2
#>   <dbl> <dbl>
#> 1     7     8
#> 2     4     5
#> 3     1     2
fun(df, c(var1, var2), var1, var2)
#> # A tibble: 3 × 2
#>    var1  var2
#>   <dbl> <dbl>
#> 1     1     2
#> 2     4     5
#> 3     7     8

We still can't use c() with the arrange and filter verbs because that's not allowed with data-masking.

df %>%
  arrange(
    c(var1, var2)
  )
#> Error in `arrange()`:
#> ! Problem with the implicit `transmute()` step.
#> x Problem while computing `..1 = c(var1, var2)`.
#> x `..1` must be size 3 or 1, not 6.

^{Created on 2022-03-17 by the reprex package (v2.0.1)}

Old answer

Replacing arrange_var with ... and specifying the variables without enclosing them in c() makes it work.

library("dplyr")

df <- tribble(
  ~var1, ~var2, ~var3,
  1, 2, 3,
  4, 5, 6,
  7, 8, 9
)

fun <- function(data, select_var, ...) {
  select_var <- enquo(select_var)
  data %>%
    select(!!select_var) %>%
    # You can pass the dots to `arrange` directly
    arrange(...)
}

fun(df, c(var1, var2), var2)
#> # A tibble: 3 x 2
#>    var1  var2
#>   <dbl> <dbl>
#> 1     1     2
#> 2     4     5
#> 3     7     8
fun(df, c(var1, var2), var1, var2)
#> # A tibble: 3 x 2
#>    var1  var2
#>   <dbl> <dbl>
#> 1     1     2
#> 2     4     5
#> 3     7     8

^{Created on 2019-03-08 by the reprex package (v0.2.1)}

It turns out that only select supports strings and character vectors. As the documentation says, "This is unlike other verbs where strings would be ambiguous." See the last example for dplyr::select.

# Two alternatives; both work with `select`.
df %>%
  select(var1, var2)
#> # A tibble: 3 x 2
#>    var1  var2
#>   <dbl> <dbl>
#> 1     1     2
#> 2     4     5
#> 3     7     8
df %>%
  select(c(var1, var2))
#> # A tibble: 3 x 2
#>    var1  var2
#>   <dbl> <dbl>
#> 1     1     2
#> 2     4     5
#> 3     7     8

# `arrange` only works with lists on comma separated unquoted variable names.
df %>%
  arrange(var1, var2)
#> # A tibble: 3 x 3
#>    var1  var2  var3
#>   <dbl> <dbl> <dbl>
#> 1     1     2     3
#> 2     4     5     6
#> 3     7     8     9
df %>%
  arrange(c(var, var2))
#> Error: incorrect size (4) at position 1, expecting : 3

^{Created on 2019-03-08 by the reprex package (v0.2.1)}

Why doesn't R dplyr arrange sort properly using a vector element within a for loop

This is "programming with dplyr", use .data for referencing columns by a string:

toy %>% 
  select(a, tf, get_it[j]) %>% 
  group_by(a) %>% 
  arrange(desc(.data[[ get_it[j] ]]), .by_group=TRUE)
# # A tibble: 100 x 3
# # Groups:   a [3]
#    a     tf       n1
#    <chr> <chr> <int>
#  1 a     F        99
#  2 a     F        98
#  3 a     F        96
#  4 a     F        95
#  5 a     T        93
#  6 a     T        92
#  7 a     T        92
#  8 a     T        90
#  9 a     F        87
# 10 a     F        86
# # ... with 90 more rows

Arrange by a dynamically specified column

Try get

dt %>% arrange(get(sort_by))

Pass a string as variable name in dplyr::filter

!! or UQ evaluates the variable, so mtcars %>% filter(!!var == 4) is the same as mtcars %>% filter('cyl' == 4) where the condition always evaluates to false; You can prove this by printing !!var in the filter function:

mtcars %>% filter({ print(!!var); (!!var) == 4 })
# [1] "cyl"
#  [1] mpg  cyl  disp hp   drat wt   qsec vs   am   gear carb
# <0 rows> (or 0-length row.names)

To evaluate var to the cyl column, you need to convert var to a symbol of cyl first, then evaluate the symbol cyl to a column:

Using rlang:

library(rlang)
var <- 'cyl'
mtcars %>% filter((!!sym(var)) == 4)

#    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#1  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#2  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#3  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
# ...

Or use as.symbol/as.name from baseR:

mtcars %>% filter((!!as.symbol(var)) == 4)

mtcars %>% filter((!!as.name(var)) == 4)

Dplyr standard evaluation using a vector of multiple strings with mutate function

There are several keys to solving this question:

Accessing the strings within a character vector and using these with dplyr
The formatting of arguments provided to the function used with mutate, here the anyNA

The goal here is to replicate this call, but using the named variable two_names instead of manually typing out c(jack,jill).

stackdf %>% rowwise %>% mutate(test = anyNA(c(jack,jill)))

# A tibble: 10 x 4
    jack  jill  jane  test
   <dbl> <dbl> <dbl> <lgl>
 1     1     1     1 FALSE
 2    NA     2     2  TRUE
 3     2    NA     3  TRUE
 4    NA     3     4  TRUE
 5     3     4     5 FALSE
 6    NA    NA     6  TRUE
 7     4     5    NA FALSE
 8    NA     6    NA  TRUE
 9     5    NA    NA  TRUE
10    NA     7    NA  TRUE

1. Using dynamic variables with dplyr

Using quo/quos: Does not accept strings as input. The solution using this method would be:
```
two_names2 <- quos(c(jack, jill))
stackdf %>% rowwise %>% mutate(test = anyNA(!!! two_names2))
```
Note that quo takes a single argument, and thus is unquoted using !!, and for multiple arguments you can use quos and !!! respectively. This is not desirable because I do not use two_names and instead have to type out the columns I wish to use.
Using as.name or rlang::sym/rlang::syms: as.name and sym take only a single input, however syms will take multiple and return a list of symbolic objects as output.
```
> two_names
[1] "jack" "jill"
> as.name(two_names)
jack
> syms(two_names)
[[1]]
jack

[[2]]
jill
```
Note that as.name ignores everything after the first element. However, syms appears to work appropriately here, so now we need to use this within the mutate call.

2. Using dynamic variables within mutate using anyNA or other variables

Using syms and anyNA directly does not actually produce the correct result.

> stackdf %>% rowwise %>% mutate(test = anyNA(!!! syms(two_names)))
    jack  jill  jane  test
   <dbl> <dbl> <dbl> <lgl>
 1     1     1     1 FALSE
 2    NA     2     2  TRUE
 3     2    NA     3 FALSE
 4    NA     3     4  TRUE
 5     3     4     5 FALSE
 6    NA    NA     6  TRUE
 7     4     5    NA FALSE
 8    NA     6    NA  TRUE
 9     5    NA    NA FALSE
10    NA     7    NA  TRUE

Inspection of the test shows that this is only taking into account the first element, and ignoring the second element. However, if I use a different function, eg sum or paste0, it is clear that both elements are being used:

> stackdf %>% rowwise %>% mutate(test = sum(!!! syms(two_names), 
                                            na.rm = TRUE))
    jack  jill  jane  test
   <dbl> <dbl> <dbl> <dbl>
 1     1     1     1     2
 2    NA     2     2     2
 3     2    NA     3     2
 4    NA     3     4     3
 5     3     4     5     7
 6    NA    NA     6     0
 7     4     5    NA     9
 8    NA     6    NA     6
 9     5    NA    NA     5
10    NA     7    NA     7

The reason for this becomes clear when you look at the arguments for anyNA vs sum.

function (x, recursive = FALSE) .Primitive("anyNA")

function (..., na.rm = FALSE) .Primitive("sum")

anyNA expects a single object x, whereas sum can take a variable list of objects (...).

Simply supplying c() fixes this problem (see answer from alistaire).

> stackdf %>% rowwise %>% mutate(test = anyNA(c(!!! syms(two_names))))
    jack  jill  jane  test
   <dbl> <dbl> <dbl> <lgl>
 1     1     1     1 FALSE
 2    NA     2     2  TRUE
 3     2    NA     3  TRUE
 4    NA     3     4  TRUE
 5     3     4     5 FALSE
 6    NA    NA     6  TRUE
 7     4     5    NA FALSE
 8    NA     6    NA  TRUE
 9     5    NA    NA  TRUE
10    NA     7    NA  TRUE

Alternately... for educational purposes, one could use a combination of sapply, any, and anyNA to produce the correct result. Here we use list so that the results are provided as a single list object.

# this produces an error an error because the elements of !!!
# are being passed to the arguments of sapply (X =, FUN = )
> stackdf %>% rowwise %>% 
    mutate(test = any(sapply(!!! syms(two_names), anyNA)))
Error in mutate_impl(.data, dots) : 
  Evaluation error: object 'jill' of mode 'function' was not found.

Supplying list fixes this problem because it binds all the results into a single object.

# the below table is the familiar incorrect result that uses only the `jack`
> stackdf %>% rowwise %>% 
    mutate(test = any(sapply(X=as.list(!!! syms(two_names)), 
                             FUN=anyNA)))

    jack  jill  jane  test
   <dbl> <dbl> <dbl> <lgl>
 1     1     1     1 FALSE
 2    NA     2     2  TRUE
 3     2    NA     3 FALSE
 4    NA     3     4  TRUE
 5     3     4     5 FALSE
 6    NA    NA     6  TRUE
 7     4     5    NA FALSE
 8    NA     6    NA  TRUE
 9     5    NA    NA FALSE
10    NA     7    NA  TRUE

# this produces the correct answer
> stackdf %>% rowwise %>% 
    mutate(test = any(X = sapply(list(!!! syms(two_names)), 
                      FUN = anyNA)))

jack  jill  jane  test
<dbl> <dbl> <dbl> <lgl>
 1     1     1     1 FALSE
 2    NA     2     2  TRUE
 3     2    NA     3  TRUE
 4    NA     3     4  TRUE
 5     3     4     5 FALSE
 6    NA    NA     6  TRUE
 7     4     5    NA FALSE
 8    NA     6    NA  TRUE
 9     5    NA    NA  TRUE
10    NA     7    NA  TRUE

Understanding why these two perform differently make sense when their behavior is compared!

> as.list(two_names)
[[1]]
[1] "jack"

[[2]]
[1] "jill"

> list(two_names)
[[1]]
[1] "jack" "jill"

Dynamically sorting columns in dplyr via passing ordered vector with column names to select

You're definitely on the right path.

mt_sum <- mtcars %>%
  group_by(am) %>%
  summarise_each(funs(min, mean, median, max), mpg, cyl) %>%
  mutate(am = as.character(am)) %>%
  left_join(y = as.data.frame(table(mtcars$am),
                              stringsAsFactors = FALSE),
            by = c("am" = "Var1")) %>%
  .[, names(.)[order(names(.))]]

Parsing string as column name in dplyr

I would use a named vector instead of trying to mess around with the dplyr programming nuances. A benefit is that this method is already vectorized.

rename_cols <- function(col) {
  
  name = paste0(col, "_new") #I want to be able to parse this into the rename function below
  
  mtcars %>% 
    rename(setNames(col, name))
}

rename_cols(colnames(mtcars))
#                     mpg_new cyl_new disp_new hp_new drat_new wt_new qsec_new vs_new am_new gear_new carb_new
# Mazda RX4              21.0       6    160.0    110     3.90  2.620    16.46      0      1        4        4
# Mazda RX4 Wag          21.0       6    160.0    110     3.90  2.875    17.02      0      1        4        4
# Datsun 710             22.8       4    108.0     93     3.85  2.320    18.61      1      1        4        1
# Hornet 4 Drive         21.4       6    258.0    110     3.08  3.215    19.44      1      0        3        1
# Hornet Sportabout      18.7       8    360.0    175     3.15  3.440    17.02      0      0        3        2
# Valiant                18.1       6    225.0    105     2.76  3.460    20.22      1      0        3        1
# ...

Edit

In this case, you might also find rename_with() to be what you need.

library(dplyr)

colnames(mtcars) -> cols

mtcars %>% 
  rename_with(~ paste0(., "_new"), any_of(cols))

# which is the same as the more concise but maybe less clear...
mtcars %>% 
  rename_with(paste0, any_of(cols), "_new")

Pass a Vector of Variable Names to Arrange() in Dplyr