Pass a Vector of Variable Names to Arrange() in Dplyr

Pass a vector of variable names to arrange() in dplyr

Hadley hasn't made this obvious in the help file--only in his NSE vignette. The versions of the functions followed by underscores use standard evaluation, so you pass them vectors of strings and the like.

If I understand your problem correctly, you can just replace arrange() with arrange_() and it will work.

Specifically, pass the vector of strings as the .dots argument when you do it.

> df %>% arrange_(.dots=c("var1","var3"))
var1 var2 var3 var4
1 1 i 5 i
2 1 x 7 w
3 1 h 8 e
4 2 b 5 f
5 2 t 5 b
6 2 w 7 h
7 3 s 6 d
8 3 f 8 e
9 4 c 5 y
10 4 o 8 c

========== Update March 2018 ==============

Using the standard evaluation versions in dplyr as I have shown here is now considered deprecated. You can read Hadley's programming vignette for the new way. Basically you will use !! to unquote one variable or !!! to unquote a vector of variables inside of arrange().

When you pass those columns, if they are bare, quote them using quo() for one variable or quos() for a vector. Don't use quotation marks. See the answer by Akrun.

If your columns are already strings, then make them names using rlang::sym() for a single column or rlang::syms() for a vector. See the answer by Christos. You can also use as.name() for a single column. Unfortunately as of this writing, the information on how to use rlang::sym() has not yet made it into the vignette I link to above (eventually it will be in the section on "variadic quasiquotation" according to his draft).

dplyr arrange() works with single variable inside c(), but not multiple variables inside of c() when evaluated inside of a function

Update 2022/03/17

The tidyverse has evolved and so should this answer.

There is no need for enquo anymore! Instead we enclose tidy-select expressions in double braces {{ }}.

library("tidyverse")

df <- tribble(
~var1, ~var2, ~var3,
1, 2, 3,
4, 5, 6,
7, 8, 9
)

fun <- function(data, select_vars, ...) {
data %>%
select(
{{ select_vars }}
) %>%
arrange(
...
)
}


fun(df, c(var1, var2), desc(var2))
#> # A tibble: 3 × 2
#> var1 var2
#> <dbl> <dbl>
#> 1 7 8
#> 2 4 5
#> 3 1 2
fun(df, c(var1, var2), var1, var2)
#> # A tibble: 3 × 2
#> var1 var2
#> <dbl> <dbl>
#> 1 1 2
#> 2 4 5
#> 3 7 8

We still can't use c() with the arrange and filter verbs because that's not allowed with data-masking.

df %>%
arrange(
c(var1, var2)
)
#> Error in `arrange()`:
#> ! Problem with the implicit `transmute()` step.
#> x Problem while computing `..1 = c(var1, var2)`.
#> x `..1` must be size 3 or 1, not 6.

Created on 2022-03-17 by the reprex package (v2.0.1)

Old answer

Replacing arrange_var with ... and specifying the variables without enclosing them in c() makes it work.

library("dplyr")

df <- tribble(
~var1, ~var2, ~var3,
1, 2, 3,
4, 5, 6,
7, 8, 9
)

fun <- function(data, select_var, ...) {
select_var <- enquo(select_var)
data %>%
select(!!select_var) %>%
# You can pass the dots to `arrange` directly
arrange(...)
}

fun(df, c(var1, var2), var2)
#> # A tibble: 3 x 2
#> var1 var2
#> <dbl> <dbl>
#> 1 1 2
#> 2 4 5
#> 3 7 8
fun(df, c(var1, var2), var1, var2)
#> # A tibble: 3 x 2
#> var1 var2
#> <dbl> <dbl>
#> 1 1 2
#> 2 4 5
#> 3 7 8

Created on 2019-03-08 by the reprex package (v0.2.1)

It turns out that only select supports strings and character vectors. As the documentation says, "This is unlike other verbs where strings would be ambiguous." See the last example for dplyr::select.

# Two alternatives; both work with `select`.
df %>%
select(var1, var2)
#> # A tibble: 3 x 2
#> var1 var2
#> <dbl> <dbl>
#> 1 1 2
#> 2 4 5
#> 3 7 8
df %>%
select(c(var1, var2))
#> # A tibble: 3 x 2
#> var1 var2
#> <dbl> <dbl>
#> 1 1 2
#> 2 4 5
#> 3 7 8

# `arrange` only works with lists on comma separated unquoted variable names.
df %>%
arrange(var1, var2)
#> # A tibble: 3 x 3
#> var1 var2 var3
#> <dbl> <dbl> <dbl>
#> 1 1 2 3
#> 2 4 5 6
#> 3 7 8 9
df %>%
arrange(c(var, var2))
#> Error: incorrect size (4) at position 1, expecting : 3

Created on 2019-03-08 by the reprex package (v0.2.1)

Why doesn't R dplyr arrange sort properly using a vector element within a for loop

This is "programming with dplyr", use .data for referencing columns by a string:

toy %>% 
select(a, tf, get_it[j]) %>%
group_by(a) %>%
arrange(desc(.data[[ get_it[j] ]]), .by_group=TRUE)
# # A tibble: 100 x 3
# # Groups: a [3]
# a tf n1
# <chr> <chr> <int>
# 1 a F 99
# 2 a F 98
# 3 a F 96
# 4 a F 95
# 5 a T 93
# 6 a T 92
# 7 a T 92
# 8 a T 90
# 9 a F 87
# 10 a F 86
# # ... with 90 more rows

Arrange by a dynamically specified column

Try get

dt %>% arrange(get(sort_by))

Pass a string as variable name in dplyr::filter

!! or UQ evaluates the variable, so mtcars %>% filter(!!var == 4) is the same as mtcars %>% filter('cyl' == 4) where the condition always evaluates to false; You can prove this by printing !!var in the filter function:

mtcars %>% filter({ print(!!var); (!!var) == 4 })
# [1] "cyl"
# [1] mpg cyl disp hp drat wt qsec vs am gear carb
# <0 rows> (or 0-length row.names)

To evaluate var to the cyl column, you need to convert var to a symbol of cyl first, then evaluate the symbol cyl to a column:

Using rlang:

library(rlang)
var <- 'cyl'
mtcars %>% filter((!!sym(var)) == 4)

# mpg cyl disp hp drat wt qsec vs am gear carb
#1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#2 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#3 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# ...

Or use as.symbol/as.name from baseR:

mtcars %>% filter((!!as.symbol(var)) == 4)

mtcars %>% filter((!!as.name(var)) == 4)

Dplyr standard evaluation using a vector of multiple strings with mutate function

There are several keys to solving this question:

  • Accessing the strings within a character vector and using these with dplyr
  • The formatting of arguments provided to the function used with mutate, here the anyNA

The goal here is to replicate this call, but using the named variable two_names instead of manually typing out c(jack,jill).

stackdf %>% rowwise %>% mutate(test = anyNA(c(jack,jill)))

# A tibble: 10 x 4
jack jill jane test
<dbl> <dbl> <dbl> <lgl>
1 1 1 1 FALSE
2 NA 2 2 TRUE
3 2 NA 3 TRUE
4 NA 3 4 TRUE
5 3 4 5 FALSE
6 NA NA 6 TRUE
7 4 5 NA FALSE
8 NA 6 NA TRUE
9 5 NA NA TRUE
10 NA 7 NA TRUE

1. Using dynamic variables with dplyr

  1. Using quo/quos: Does not accept strings as input. The solution using this method would be:

    two_names2 <- quos(c(jack, jill))
    stackdf %>% rowwise %>% mutate(test = anyNA(!!! two_names2))

    Note that quo takes a single argument, and thus is unquoted using !!, and for multiple arguments you can use quos and !!! respectively. This is not desirable because I do not use two_names and instead have to type out the columns I wish to use.

  2. Using as.name or rlang::sym/rlang::syms: as.name and sym take only a single input, however syms will take multiple and return a list of symbolic objects as output.

    > two_names
    [1] "jack" "jill"
    > as.name(two_names)
    jack
    > syms(two_names)
    [[1]]
    jack

    [[2]]
    jill

    Note that as.name ignores everything after the first element. However, syms appears to work appropriately here, so now we need to use this within the mutate call.

2. Using dynamic variables within mutate using anyNA or other variables

  1. Using syms and anyNA directly does not actually produce the correct result.

    > stackdf %>% rowwise %>% mutate(test = anyNA(!!! syms(two_names)))
    jack jill jane test
    <dbl> <dbl> <dbl> <lgl>
    1 1 1 1 FALSE
    2 NA 2 2 TRUE
    3 2 NA 3 FALSE
    4 NA 3 4 TRUE
    5 3 4 5 FALSE
    6 NA NA 6 TRUE
    7 4 5 NA FALSE
    8 NA 6 NA TRUE
    9 5 NA NA FALSE
    10 NA 7 NA TRUE

    Inspection of the test shows that this is only taking into account the first element, and ignoring the second element. However, if I use a different function, eg sum or paste0, it is clear that both elements are being used:

    > stackdf %>% rowwise %>% mutate(test = sum(!!! syms(two_names), 
    na.rm = TRUE))
    jack jill jane test
    <dbl> <dbl> <dbl> <dbl>
    1 1 1 1 2
    2 NA 2 2 2
    3 2 NA 3 2
    4 NA 3 4 3
    5 3 4 5 7
    6 NA NA 6 0
    7 4 5 NA 9
    8 NA 6 NA 6
    9 5 NA NA 5
    10 NA 7 NA 7

    The reason for this becomes clear when you look at the arguments for anyNA vs sum.

    function (x, recursive = FALSE) .Primitive("anyNA")

    function (..., na.rm = FALSE) .Primitive("sum")

    anyNA expects a single object x, whereas sum can take a variable list of objects (...).

  2. Simply supplying c() fixes this problem (see answer from alistaire).

    > stackdf %>% rowwise %>% mutate(test = anyNA(c(!!! syms(two_names))))
    jack jill jane test
    <dbl> <dbl> <dbl> <lgl>
    1 1 1 1 FALSE
    2 NA 2 2 TRUE
    3 2 NA 3 TRUE
    4 NA 3 4 TRUE
    5 3 4 5 FALSE
    6 NA NA 6 TRUE
    7 4 5 NA FALSE
    8 NA 6 NA TRUE
    9 5 NA NA TRUE
    10 NA 7 NA TRUE
  3. Alternately... for educational purposes, one could use a combination of sapply, any, and anyNA to produce the correct result. Here we use list so that the results are provided as a single list object.

    # this produces an error an error because the elements of !!!
    # are being passed to the arguments of sapply (X =, FUN = )
    > stackdf %>% rowwise %>%
    mutate(test = any(sapply(!!! syms(two_names), anyNA)))
    Error in mutate_impl(.data, dots) :
    Evaluation error: object 'jill' of mode 'function' was not found.

    Supplying list fixes this problem because it binds all the results into a single object.

    # the below table is the familiar incorrect result that uses only the `jack`
    > stackdf %>% rowwise %>%
    mutate(test = any(sapply(X=as.list(!!! syms(two_names)),
    FUN=anyNA)))

    jack jill jane test
    <dbl> <dbl> <dbl> <lgl>
    1 1 1 1 FALSE
    2 NA 2 2 TRUE
    3 2 NA 3 FALSE
    4 NA 3 4 TRUE
    5 3 4 5 FALSE
    6 NA NA 6 TRUE
    7 4 5 NA FALSE
    8 NA 6 NA TRUE
    9 5 NA NA FALSE
    10 NA 7 NA TRUE

    # this produces the correct answer
    > stackdf %>% rowwise %>%
    mutate(test = any(X = sapply(list(!!! syms(two_names)),
    FUN = anyNA)))

    jack jill jane test
    <dbl> <dbl> <dbl> <lgl>
    1 1 1 1 FALSE
    2 NA 2 2 TRUE
    3 2 NA 3 TRUE
    4 NA 3 4 TRUE
    5 3 4 5 FALSE
    6 NA NA 6 TRUE
    7 4 5 NA FALSE
    8 NA 6 NA TRUE
    9 5 NA NA TRUE
    10 NA 7 NA TRUE

    Understanding why these two perform differently make sense when their behavior is compared!

    > as.list(two_names)
    [[1]]
    [1] "jack"

    [[2]]
    [1] "jill"

    > list(two_names)
    [[1]]
    [1] "jack" "jill"

Dynamically sorting columns in dplyr via passing ordered vector with column names to select

You're definitely on the right path.

mt_sum <- mtcars %>%
group_by(am) %>%
summarise_each(funs(min, mean, median, max), mpg, cyl) %>%
mutate(am = as.character(am)) %>%
left_join(y = as.data.frame(table(mtcars$am),
stringsAsFactors = FALSE),
by = c("am" = "Var1")) %>%
.[, names(.)[order(names(.))]]

Parsing string as column name in dplyr

I would use a named vector instead of trying to mess around with the dplyr programming nuances. A benefit is that this method is already vectorized.

rename_cols <- function(col) {

name = paste0(col, "_new") #I want to be able to parse this into the rename function below

mtcars %>%
rename(setNames(col, name))
}

rename_cols(colnames(mtcars))
# mpg_new cyl_new disp_new hp_new drat_new wt_new qsec_new vs_new am_new gear_new carb_new
# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
# ...


Edit

In this case, you might also find rename_with() to be what you need.

library(dplyr)

colnames(mtcars) -> cols

mtcars %>%
rename_with(~ paste0(., "_new"), any_of(cols))

# which is the same as the more concise but maybe less clear...
mtcars %>%
rename_with(paste0, any_of(cols), "_new")


Related Topics



Leave a reply



Submit