Use Dynamic Name For New Column/Variable in 'Dplyr'

Use dynamic name for new column/variable in `dplyr`

Since you are dynamically building a variable name as a character value, it makes more sense to do assignment using standard data.frame indexing which allows for character values for column names. For example:

multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    df[[varname]] <- with(df, Petal.Width * n)
    df
}

The mutate function makes it very easy to name new columns via named parameters. But that assumes you know the name when you type the command. If you want to dynamically specify the column name, then you need to also build the named argument.

dplyr version >= 1.0

With the latest dplyr version you can use the syntax from the glue package when naming parameters when using :=. So here the {} in the name grab the value by evaluating the expression inside.

multipetal <- function(df, n) {
  mutate(df, "petal.{n}" := Petal.Width * n)
}

If you are passing a column name to your function, you can use {{}} in the string as well as for the column name

meanofcol <- function(df, col) {
  mutate(df, "Mean of {{col}}" := mean({{col}}))
}
meanofcol(iris, Petal.Width)

dplyr version >= 0.7

dplyr starting with version 0.7 allows you to use := to dynamically assign parameter names. You can write your function as:

# --- dplyr version 0.7+---
multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    mutate(df, !!varname := Petal.Width * n)
}

For more information, see the documentation available form vignette("programming", "dplyr").

dplyr (>=0.3 & <0.7)

Slightly earlier version of dplyr (>=0.3 <0.7), encouraged the use of "standard evaluation" alternatives to many of the functions. See the Non-standard evaluation vignette for more information (vignette("nse")).

So here, the answer is to use mutate_() rather than mutate() and do:

# --- dplyr version 0.3-0.5---
multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    varval <- lazyeval::interp(~Petal.Width * n, n=n)
    mutate_(df, .dots= setNames(list(varval), varname))
}

dplyr < 0.3

Note this is also possible in older versions of dplyr that existed when the question was originally posed. It requires careful use of quote and setName:

# --- dplyr versions < 0.3 ---
multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    pp <- c(quote(df), setNames(list(quote(Petal.Width * n)), varname))
    do.call("mutate", pp)
}

dynamicaly name a new variable / column within a custom function dplyr mutate and paste

We may use the arguments as unquoted and use {{}} for evaluation

my_fun <- function(dataf, V1, V2){
   dataf %>%
   dplyr::mutate("{{V1}}_{{V2}}" := paste0(format({{V1}}, big.mark   = ",") ,
      '\n(' , format({{V2}}, big.mark   = ",") , ')'))
}

-testing

my_fun(df, speed1, n1)
string   speed1   speed2 n1 n2       speed1_n1
1    car 7886.962 3218.585 37 83 7,886.962\n(37)
2  train 9534.978 5524.649 98 34 9,534.978\n(98)
3   bike 6984.790 9476.838 60 55 6,984.790\n(60)
4  plain 6543.198 2638.609  9 53 6,543.198\n( 9)

Dynamic variables names in dplyr function across multiple columns

We could use .names in across to rename

mean_fun_multicols <- function(data, group_cols, summary_cols) {
  data %>%
    group_by(across({{group_cols}})) %>%
     summarise(across({{ summary_cols }},
         ~ mean(., na.rm = TRUE), .names = "mean_{.col}"), .groups = "drop")
}

-testing

mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt))
# A tibble: 8 × 4
    cyl  gear mean_mpg mean_wt
  <dbl> <dbl>    <dbl>   <dbl>
1     4     3     21.5    2.46
2     4     4     26.9    2.38
3     4     5     28.2    1.83
4     6     3     19.8    3.34
5     6     4     19.8    3.09
6     6     5     19.7    2.77
7     8     3     15.0    4.10
8     8     5     15.4    3.37

NOTE: The := is mainly used when there is a single column in tidyverse

If we use the OP's function, we are assigning multiple columns to a single column and this returns a tibble instead of a normal column. We may need to unpack

library(tidyr)
> mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt)) %>% str
`summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
grouped_df [8 × 3] (S3: grouped_df/tbl_df/tbl/data.frame)
 $ cyl            : num [1:8] 4 4 4 6 6 6 8 8
 $ gear           : num [1:8] 3 4 5 3 4 5 3 5
 $ mean_c(mpg, wt): tibble [8 × 2] (S3: tbl_df/tbl/data.frame)
  ..$ mpg: num [1:8] 21.5 26.9 28.2 19.8 19.8 ...
  ..$ wt : num [1:8] 2.46 2.38 1.83 3.34 3.09 ...
 - attr(*, "groups")= tibble [3 × 2] (S3: tbl_df/tbl/data.frame)
  ..$ cyl  : num [1:3] 4 6 8
  ..$ .rows: list<int> [1:3] 
  .. ..$ : int [1:3] 1 2 3
  .. ..$ : int [1:3] 4 5 6
  .. ..$ : int [1:2] 7 8
  .. ..@ ptype: int(0) 
  ..- attr(*, ".drop")= logi TRUE

> mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt)) %>% 
        unpack(where(is_tibble))
`summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
# A tibble: 8 × 4
# Groups:   cyl [3]
    cyl  gear   mpg    wt
  <dbl> <dbl> <dbl> <dbl>
1     4     3  21.5  2.46
2     4     4  26.9  2.38
3     4     5  28.2  1.83
4     6     3  19.8  3.34
5     6     4  19.8  3.09
6     6     5  19.7  2.77
7     8     3  15.0  4.10
8     8     5  15.4  3.37

Use dynamically generated column names in dplyr

In the OP's code, we need select

library(dplyr)
df %>% 
  rowwise() %>% 
  mutate(consensus=max(table(unlist(select(cur_data(), select_vars))) ))

-output

# A tibble: 10 x 4
# Rowwise: 
    var1  var2  var3 consensus
   <int> <int> <int>     <int>
 1     1     1     1         2
 2     1     1     3         1
 3     1     2     1         2
 4     1     2     1         2
 5     2     2     2         2
 6     2     3     3         1
 7     2     3     2         2
 8     1     1     1         2
 9     3     1     2         1
10     3     3     2         1

Or just subset from cur_data() which would only return the data keeping the group attributes

df %>%
     rowwise %>% 
     mutate(consensus = max(table(unlist(cur_data()[select_vars]))))
# A tibble: 10 x 4
# Rowwise: 
    var1  var2  var3 consensus
   <int> <int> <int>     <int>
 1     1     1     1         2
 2     1     1     3         1
 3     1     2     1         2
 4     1     2     1         2
 5     2     2     2         2
 6     2     3     3         1
 7     2     3     2         2
 8     1     1     1         2
 9     3     1     2         1
10     3     3     2         1

Or using pmap

library(purrr)
df %>%
     mutate(consensus = pmap_dbl(cur_data()[select_vars], ~ max(table(c(...)))))
# A tibble: 10 x 4
    var1  var2  var3 consensus
   <int> <int> <int>     <dbl>
 1     1     1     1         2
 2     1     1     3         1
 3     1     2     1         2
 4     1     2     1         2
 5     2     2     2         2
 6     2     3     3         1
 7     2     3     2         2
 8     1     1     1         2
 9     3     1     2         1
10     3     3     2         1

As these are rowwise operations, can get some efficiency if we use collapse functions

library(collapse)
tfm(df, consensus = dapply(slt(df, select_vars), MARGIN = 1,
       FUN = function(x) fmax(tabulate(x))))
# A tibble: 10 x 4
    var1  var2  var3 consensus
 * <int> <int> <int>     <int>
 1     1     1     1         2
 2     1     1     3         1
 3     1     2     1         2
 4     1     2     1         2
 5     2     2     2         2
 6     2     3     3         1
 7     2     3     2         2
 8     1     1     1         2
 9     3     1     2         1
10     3     3     2         1

Benchmarks

As noted above, collapse is faster (run on a slightly bigger dataset)

df1 <- df[rep(seq_len(nrow(df)), 1e5), ]

system.time({
tfm(df1, consensus = dapply(slt(df1, select_vars), MARGIN = 1,
       FUN = function(x) fmax(tabulate(x))))

})
#user  system elapsed 
#  5.257   0.123   5.323 

system.time({
df1 %>%
     mutate(consensus = pmap_dbl(cur_data()[select_vars], ~ max(table(c(...)))))

})
#user  system elapsed 
# 54.813   0.517  55.246

The rowwise operation is taking too much time, so stopped the execution

df1 %>% 
   rowwise() %>% 
   mutate(consensus=max(table(unlist(select(cur_data(), select_vars))) ))
 })
Timing stopped at: 575.5 3.342 581.3

Mutate a dynamic column name with conditions using other dynamic column names

use get to retreive column value instead

library(tidyverse)

d <- mtcars %>% tibble
fld_name <-  "mpg"
other_fld_name <- "cyl"

d %>% mutate(!!fld_name := ifelse(get(other_fld_name) < 5 ,NA, get(fld_name)))

#> # A tibble: 32 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  NA       4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  NA       4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  NA       4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ... with 22 more rows

^{Created on 2021-06-22 by the reprex package (v2.0.0)}

creating and accessing dynamic column names within dplyr functions

Use across with the .names argument or if foo_cnt, etc. with an underscore is ok then just omit the .names argument since that is the default.

library(dplyr)
library(tibble)

do.some.stuff.2 <- function(data, col) {
  cnt <- function(x) cumsum(!is.na(x))
  mx <- function(x) cummax(cumsum(x))      
  mu <- function(x) cumsum(x) / cnt(x)
  data %>%
    select(date, {{col}}) %>%
    filter(!is.na(date) & !is.na({{col}})) %>%
    mutate(across({{col}}, lst(cnt, sum=cumsum, max=mx, mu), .names = "{.col}.{.fn}" ))
}
# test
do.some.stuff.2(example, foo)

giving:

# A tibble: 6 x 6
  date             foo foo.cnt   foo.sum   foo.max    foo.mu
  <date>         <dbl>   <int>     <dbl>     <dbl>     <dbl>
1 2021-02-11 -0.000202       1 -0.000202 -0.000202 -0.000202
2 2021-02-12  0.363          2  0.363     0.363     0.181   
3 2021-02-13  1.27           3  1.63      1.63      0.543   
4 2021-02-14  1.50           4  3.13      3.13      0.781   
5 2021-02-15  1.00           5  4.13      4.13      0.826   
6 2021-02-16 -0.458          6  3.67      4.13      0.612

dplyr mutate using dynamic variable name while respecting group_by

I actually did not know much about pluck, so I don't know what went wrong, but I would go for this and this works:

iris %>% 
  group_by(Species) %>% 
  mutate(
    !! varname :=
      mean(!!as.name(varname), na.rm = T) / 
      max(sd(!!as.name(varname)),
          minsd[varname])
  )

Let me know if this isn't what you were looking for.

How to pass dynamic column names in dplyr into custom function?

Using the latest version of dplyr (>=0.7), you can use the rlang !! (bang-bang) operator.

library(tidyverse)
from <- "Stand1971"
to <- "Stand1987"

data %>%
  mutate(diff=(!!as.name(from))-(!!as.name(to)))

You just need to convert the strings to names with as.name and then insert them into the expression. Unfortunately I seem to have to use a few more parenthesis than I would like, but the !! operator seems to fall in a weird order-of-operations order.

Original answer, dplyr (0.3-<0.7):

From that vignette (vignette("nse","dplyr")), use lazyeval's interp() function

library(lazyeval)

from <- "Stand1971"
to <- "Stand1987"

data %>%
  mutate_(diff=interp(~from - to, from=as.name(from), to=as.name(to)))

Dplyr - Mutate dynamically named variables using other dynamically named variables

Here, we don't need the enquo/quo_name for 'year' as we are passing a numeric value. The output of paste will be character class, using sym from rlang (as @joran mentioned) this can be converted to symbol and evaluated with !!. Make sure to add braces around the '!! calc1_header' and '!! calc2_header' to evaluate the specific object

my_fun <- function(df, year) {

  total_header <- paste("total", year, sep = "_")
  calc1_header <- rlang::sym(paste("value1", year, sep = "_"))
  calc2_header <- rlang::sym(paste("value2", year, sep = "_"))

 df %>%
       mutate(!!total_header := multiplier * (!!calc1_header) + (!!calc2_header))



}

my_fun(df1, 2016)
#   ID multiplier value1_2015 value2_2015 value1_2016 value2_2016 total_2016
#1  1        0.5           2           3           1           4        4.5
#2  2        1.0           2           4           4           5        9.0

Use Dynamic Name For New Column/Variable in 'Dplyr'