Use Dynamic Name For New Column/Variable in 'Dplyr'

Use dynamic name for new column/variable in `dplyr`

Since you are dynamically building a variable name as a character value, it makes more sense to do assignment using standard data.frame indexing which allows for character values for column names. For example:

multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df[[varname]] <- with(df, Petal.Width * n)
df
}

The mutate function makes it very easy to name new columns via named parameters. But that assumes you know the name when you type the command. If you want to dynamically specify the column name, then you need to also build the named argument.



dplyr version >= 1.0

With the latest dplyr version you can use the syntax from the glue package when naming parameters when using :=. So here the {} in the name grab the value by evaluating the expression inside.

multipetal <- function(df, n) {
mutate(df, "petal.{n}" := Petal.Width * n)
}

If you are passing a column name to your function, you can use {{}} in the string as well as for the column name

meanofcol <- function(df, col) {
mutate(df, "Mean of {{col}}" := mean({{col}}))
}
meanofcol(iris, Petal.Width)



dplyr version >= 0.7

dplyr starting with version 0.7 allows you to use := to dynamically assign parameter names. You can write your function as:

# --- dplyr version 0.7+---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
mutate(df, !!varname := Petal.Width * n)
}

For more information, see the documentation available form vignette("programming", "dplyr").



dplyr (>=0.3 & <0.7)

Slightly earlier version of dplyr (>=0.3 <0.7), encouraged the use of "standard evaluation" alternatives to many of the functions. See the Non-standard evaluation vignette for more information (vignette("nse")).

So here, the answer is to use mutate_() rather than mutate() and do:

# --- dplyr version 0.3-0.5---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
varval <- lazyeval::interp(~Petal.Width * n, n=n)
mutate_(df, .dots= setNames(list(varval), varname))
}


dplyr < 0.3

Note this is also possible in older versions of dplyr that existed when the question was originally posed. It requires careful use of quote and setName:

# --- dplyr versions < 0.3 ---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
pp <- c(quote(df), setNames(list(quote(Petal.Width * n)), varname))
do.call("mutate", pp)
}

dynamicaly name a new variable / column within a custom function dplyr mutate and paste

We may use the arguments as unquoted and use {{}} for evaluation

my_fun <- function(dataf, V1, V2){
dataf %>%
dplyr::mutate("{{V1}}_{{V2}}" := paste0(format({{V1}}, big.mark = ",") ,
'\n(' , format({{V2}}, big.mark = ",") , ')'))
}

-testing

my_fun(df, speed1, n1)
string speed1 speed2 n1 n2 speed1_n1
1 car 7886.962 3218.585 37 83 7,886.962\n(37)
2 train 9534.978 5524.649 98 34 9,534.978\n(98)
3 bike 6984.790 9476.838 60 55 6,984.790\n(60)
4 plain 6543.198 2638.609 9 53 6,543.198\n( 9)

Dynamic variables names in dplyr function across multiple columns

We could use .names in across to rename

mean_fun_multicols <- function(data, group_cols, summary_cols) {
data %>%
group_by(across({{group_cols}})) %>%
summarise(across({{ summary_cols }},
~ mean(., na.rm = TRUE), .names = "mean_{.col}"), .groups = "drop")
}

-testing

mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt))
# A tibble: 8 × 4
cyl gear mean_mpg mean_wt
<dbl> <dbl> <dbl> <dbl>
1 4 3 21.5 2.46
2 4 4 26.9 2.38
3 4 5 28.2 1.83
4 6 3 19.8 3.34
5 6 4 19.8 3.09
6 6 5 19.7 2.77
7 8 3 15.0 4.10
8 8 5 15.4 3.37

NOTE: The := is mainly used when there is a single column in tidyverse


If we use the OP's function, we are assigning multiple columns to a single column and this returns a tibble instead of a normal column. We may need to unpack

library(tidyr)
> mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt)) %>% str
`summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
grouped_df [8 × 3] (S3: grouped_df/tbl_df/tbl/data.frame)
$ cyl : num [1:8] 4 4 4 6 6 6 8 8
$ gear : num [1:8] 3 4 5 3 4 5 3 5
$ mean_c(mpg, wt): tibble [8 × 2] (S3: tbl_df/tbl/data.frame)
..$ mpg: num [1:8] 21.5 26.9 28.2 19.8 19.8 ...
..$ wt : num [1:8] 2.46 2.38 1.83 3.34 3.09 ...
- attr(*, "groups")= tibble [3 × 2] (S3: tbl_df/tbl/data.frame)
..$ cyl : num [1:3] 4 6 8
..$ .rows: list<int> [1:3]
.. ..$ : int [1:3] 1 2 3
.. ..$ : int [1:3] 4 5 6
.. ..$ : int [1:2] 7 8
.. ..@ ptype: int(0)
..- attr(*, ".drop")= logi TRUE

> mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt)) %>%
unpack(where(is_tibble))
`summarise()` has grouped output by 'cyl'. You can override using the `.groups` argument.
# A tibble: 8 × 4
# Groups: cyl [3]
cyl gear mpg wt
<dbl> <dbl> <dbl> <dbl>
1 4 3 21.5 2.46
2 4 4 26.9 2.38
3 4 5 28.2 1.83
4 6 3 19.8 3.34
5 6 4 19.8 3.09
6 6 5 19.7 2.77
7 8 3 15.0 4.10
8 8 5 15.4 3.37

Use dynamically generated column names in dplyr

In the OP's code, we need select

library(dplyr)
df %>%
rowwise() %>%
mutate(consensus=max(table(unlist(select(cur_data(), select_vars))) ))

-output

# A tibble: 10 x 4
# Rowwise:
var1 var2 var3 consensus
<int> <int> <int> <int>
1 1 1 1 2
2 1 1 3 1
3 1 2 1 2
4 1 2 1 2
5 2 2 2 2
6 2 3 3 1
7 2 3 2 2
8 1 1 1 2
9 3 1 2 1
10 3 3 2 1

Or just subset from cur_data() which would only return the data keeping the group attributes

df %>%
rowwise %>%
mutate(consensus = max(table(unlist(cur_data()[select_vars]))))
# A tibble: 10 x 4
# Rowwise:
var1 var2 var3 consensus
<int> <int> <int> <int>
1 1 1 1 2
2 1 1 3 1
3 1 2 1 2
4 1 2 1 2
5 2 2 2 2
6 2 3 3 1
7 2 3 2 2
8 1 1 1 2
9 3 1 2 1
10 3 3 2 1

Or using pmap

library(purrr)
df %>%
mutate(consensus = pmap_dbl(cur_data()[select_vars], ~ max(table(c(...)))))
# A tibble: 10 x 4
var1 var2 var3 consensus
<int> <int> <int> <dbl>
1 1 1 1 2
2 1 1 3 1
3 1 2 1 2
4 1 2 1 2
5 2 2 2 2
6 2 3 3 1
7 2 3 2 2
8 1 1 1 2
9 3 1 2 1
10 3 3 2 1

As these are rowwise operations, can get some efficiency if we use collapse functions

library(collapse)
tfm(df, consensus = dapply(slt(df, select_vars), MARGIN = 1,
FUN = function(x) fmax(tabulate(x))))
# A tibble: 10 x 4
var1 var2 var3 consensus
* <int> <int> <int> <int>
1 1 1 1 2
2 1 1 3 1
3 1 2 1 2
4 1 2 1 2
5 2 2 2 2
6 2 3 3 1
7 2 3 2 2
8 1 1 1 2
9 3 1 2 1
10 3 3 2 1

Benchmarks

As noted above, collapse is faster (run on a slightly bigger dataset)

df1 <- df[rep(seq_len(nrow(df)), 1e5), ]

system.time({
tfm(df1, consensus = dapply(slt(df1, select_vars), MARGIN = 1,
FUN = function(x) fmax(tabulate(x))))

})
#user system elapsed
# 5.257 0.123 5.323

system.time({
df1 %>%
mutate(consensus = pmap_dbl(cur_data()[select_vars], ~ max(table(c(...)))))

})
#user system elapsed
# 54.813 0.517 55.246

The rowwise operation is taking too much time, so stopped the execution

df1 %>% 
rowwise() %>%
mutate(consensus=max(table(unlist(select(cur_data(), select_vars))) ))
})
Timing stopped at: 575.5 3.342 581.3

Mutate a dynamic column name with conditions using other dynamic column names

use get to retreive column value instead

library(tidyverse)

d <- mtcars %>% tibble
fld_name <- "mpg"
other_fld_name <- "cyl"

d %>% mutate(!!fld_name := ifelse(get(other_fld_name) < 5 ,NA, get(fld_name)))

#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 NA 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 NA 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 NA 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ... with 22 more rows

Created on 2021-06-22 by the reprex package (v2.0.0)

creating and accessing dynamic column names within dplyr functions

Use across with the .names argument or if foo_cnt, etc. with an underscore is ok then just omit the .names argument since that is the default.

library(dplyr)
library(tibble)

do.some.stuff.2 <- function(data, col) {
cnt <- function(x) cumsum(!is.na(x))
mx <- function(x) cummax(cumsum(x))
mu <- function(x) cumsum(x) / cnt(x)
data %>%
select(date, {{col}}) %>%
filter(!is.na(date) & !is.na({{col}})) %>%
mutate(across({{col}}, lst(cnt, sum=cumsum, max=mx, mu), .names = "{.col}.{.fn}" ))
}
# test
do.some.stuff.2(example, foo)

giving:

# A tibble: 6 x 6
date foo foo.cnt foo.sum foo.max foo.mu
<date> <dbl> <int> <dbl> <dbl> <dbl>
1 2021-02-11 -0.000202 1 -0.000202 -0.000202 -0.000202
2 2021-02-12 0.363 2 0.363 0.363 0.181
3 2021-02-13 1.27 3 1.63 1.63 0.543
4 2021-02-14 1.50 4 3.13 3.13 0.781
5 2021-02-15 1.00 5 4.13 4.13 0.826
6 2021-02-16 -0.458 6 3.67 4.13 0.612

dplyr mutate using dynamic variable name while respecting group_by

I actually did not know much about pluck, so I don't know what went wrong, but I would go for this and this works:

iris %>% 
group_by(Species) %>%
mutate(
!! varname :=
mean(!!as.name(varname), na.rm = T) /
max(sd(!!as.name(varname)),
minsd[varname])
)

Let me know if this isn't what you were looking for.

How to pass dynamic column names in dplyr into custom function?

Using the latest version of dplyr (>=0.7), you can use the rlang !! (bang-bang) operator.

library(tidyverse)
from <- "Stand1971"
to <- "Stand1987"

data %>%
mutate(diff=(!!as.name(from))-(!!as.name(to)))

You just need to convert the strings to names with as.name and then insert them into the expression. Unfortunately I seem to have to use a few more parenthesis than I would like, but the !! operator seems to fall in a weird order-of-operations order.

Original answer, dplyr (0.3-<0.7):

From that vignette (vignette("nse","dplyr")), use lazyeval's interp() function

library(lazyeval)

from <- "Stand1971"
to <- "Stand1987"

data %>%
mutate_(diff=interp(~from - to, from=as.name(from), to=as.name(to)))

Dplyr - Mutate dynamically named variables using other dynamically named variables

Here, we don't need the enquo/quo_name for 'year' as we are passing a numeric value. The output of paste will be character class, using sym from rlang (as @joran mentioned) this can be converted to symbol and evaluated with !!. Make sure to add braces around the '!! calc1_header' and '!! calc2_header' to evaluate the specific object

my_fun <- function(df, year) {

total_header <- paste("total", year, sep = "_")
calc1_header <- rlang::sym(paste("value1", year, sep = "_"))
calc2_header <- rlang::sym(paste("value2", year, sep = "_"))

df %>%
mutate(!!total_header := multiplier * (!!calc1_header) + (!!calc2_header))



}

my_fun(df1, 2016)
# ID multiplier value1_2015 value2_2015 value1_2016 value2_2016 total_2016
#1 1 0.5 2 3 1 4 4.5
#2 2 1.0 2 4 4 5 9.0


Related Topics



Leave a reply



Submit