Use dynamic name for new column/variable in `dplyr`
Since you are dynamically building a variable name as a character value, it makes more sense to do assignment using standard data.frame indexing which allows for character values for column names. For example:
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df[[varname]] <- with(df, Petal.Width * n)
df
}
The mutate
function makes it very easy to name new columns via named parameters. But that assumes you know the name when you type the command. If you want to dynamically specify the column name, then you need to also build the named argument.
dplyr version >= 1.0
With the latest dplyr version you can use the syntax from the glue
package when naming parameters when using :=
. So here the {}
in the name grab the value by evaluating the expression inside.
multipetal <- function(df, n) {
mutate(df, "petal.{n}" := Petal.Width * n)
}
If you are passing a column name to your function, you can use {{}}
in the string as well as for the column name
meanofcol <- function(df, col) {
mutate(df, "Mean of {{col}}" := mean({{col}}))
}
meanofcol(iris, Petal.Width)
dplyr version >= 0.7
dplyr
starting with version 0.7 allows you to use :=
to dynamically assign parameter names. You can write your function as:
# --- dplyr version 0.7+---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
mutate(df, !!varname := Petal.Width * n)
}
For more information, see the documentation available form vignette("programming", "dplyr")
.
dplyr (>=0.3 & <0.7)
Slightly earlier version of dplyr
(>=0.3 <0.7), encouraged the use of "standard evaluation" alternatives to many of the functions. See the Non-standard evaluation vignette for more information (vignette("nse")
).
So here, the answer is to use mutate_()
rather than mutate()
and do:
# --- dplyr version 0.3-0.5---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
varval <- lazyeval::interp(~Petal.Width * n, n=n)
mutate_(df, .dots= setNames(list(varval), varname))
}
dplyr < 0.3
Note this is also possible in older versions of dplyr
that existed when the question was originally posed. It requires careful use of quote
and setName
:
# --- dplyr versions < 0.3 ---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
pp <- c(quote(df), setNames(list(quote(Petal.Width * n)), varname))
do.call("mutate", pp)
}
Dplyr - Mutate dynamically named variables using other dynamically named variables
Here, we don't need the enquo/quo_name
for 'year' as we are passing a numeric value. The output of paste
will be character
class, using sym
from rlang
(as @joran mentioned) this can be converted to symbol and evaluated with !!
. Make sure to add braces around the '!! calc1_header' and '!! calc2_header' to evaluate the specific object
my_fun <- function(df, year) {
total_header <- paste("total", year, sep = "_")
calc1_header <- rlang::sym(paste("value1", year, sep = "_"))
calc2_header <- rlang::sym(paste("value2", year, sep = "_"))
df %>%
mutate(!!total_header := multiplier * (!!calc1_header) + (!!calc2_header))
}
my_fun(df1, 2016)
# ID multiplier value1_2015 value2_2015 value1_2016 value2_2016 total_2016
#1 1 0.5 2 3 1 4 4.5
#2 2 1.0 2 4 4 5 9.0
Mutate a dynamic column name with conditions using other dynamic column names
use get
to retreive column value instead
library(tidyverse)
d <- mtcars %>% tibble
fld_name <- "mpg"
other_fld_name <- "cyl"
d %>% mutate(!!fld_name := ifelse(get(other_fld_name) < 5 ,NA, get(fld_name)))
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 NA 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 NA 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 NA 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ... with 22 more rows
Created on 2021-06-22 by the reprex package (v2.0.0)
Dynamic variable names to mutate variables in for-loop
As we are passing string, convert to sym
bol and evaluate (!!
)
func <- function(i) {
mutate(df1, !!i := case_when(!is.na(!! rlang::ensym(i)) ~ as.character(!! rlang::ensym(i)),
is.na(!!rlang::ensym(i)) & var0 != '1' ~ '4444',
TRUE ~ '0'))
}
-testing
for(i in vars) {
df1 <- func(i)
}
df1
var0 var1 var2 var3
1 1 0 1 NA
2 2 1 4444 1
3 2 0 0 0
4 1 1 0 4444
5 1 0 1 1
6 2 4444 4444 NA
7 2 4444 4444 1
We may do this with across
as well
df1 %>%
mutate(across(all_of(vars),
~ case_when(!is.na(.) ~ as.character(.),
is.na(.) & var0 != '1' ~ '4444', TRUE ~ '0')))
var0 var1 var2 var3
1 1 0 1 NA
2 2 1 4444 1
3 2 0 0 0
4 1 1 0 4444
5 1 0 1 1
6 2 4444 4444 NA
7 2 4444 4444 1
R mutate across and using two dynamically named columns to calculate result
- You're missing the
~
to mark theifelse(..)
as a function of sorts. cur_col()
not found (for me), should likely be.
or.x
- You are
str_detect
ing in the name of the_Kenn
-equivalent column, not the values in that column; we need to addcur_data()[[..]]
as well.
I tend to not use stringr
for straight-forward replacements like this, preferring base R:
library(dplyr)
Test %>%
mutate(
across(
paste0(Param, "_Konz"),
~ if_else( grepl("[XF]", cur_data()[[ gsub("_Konz", "_Kenn", cur_column()) ]] ),
.[NA], . )
)
)
# # A tibble: 6 x 5
# Date HCl_Konz HCl_Kenn CO_Konz CO_Kenn
# <dbl> <dbl> <chr> <dbl> <chr>
# 1 1 4 "" 4 ""
# 2 2 5 "" 1 ""
# 3 3 NA "X" NA "BX"
# 4 4 5 "" 4 ""
# 5 5 NA "F" 4 ""
# 6 6 5 "" NA "EXr"
I recommend dplyr::if_else
in place of ifelse
for several reasons, but it comes with the strict (and safe!) requirement that the true=
and false=
arguments be precisely the same type. You recognize at least most of this by your use of NA_real_
; my use of .[NA]
is another way of ensuring that we get the correct NA
-variant based on the actual data, allowing this method to work if some of your Params
are integer
and some are numeric
, for example.
An alternative approach (which may help later) is to pivot the data and work with just two columns at a time.
library(tidyr) # pivot_longer
Test %>%
pivot_longer(
matches("_(Konz|Kenn)$"),
names_pattern = "(.*)_(.*)", names_to = c("elem", ".value")
) %>%
mutate(
Konz = if_else(grepl("[XF]", Kenn), Konz[NA], Konz)
)
# # A tibble: 12 x 4
# Date elem Konz Kenn
# <dbl> <chr> <dbl> <chr>
# 1 1 HCl 4 ""
# 2 1 CO 4 ""
# 3 2 HCl 5 ""
# 4 2 CO 1 ""
# 5 3 HCl NA "X"
# 6 3 CO NA "BX"
# 7 4 HCl 5 ""
# 8 4 CO 4 ""
# 9 5 HCl NA "F"
# 10 5 CO 4 ""
# 11 6 HCl 5 ""
# 12 6 CO NA "EXr"
This pivoted format has the advantage of allowing simpler calls to mutate
, and (if you plan on plotting this) playing much better with ggplot2
's preference for long data.
dynamicaly name a new variable / column within a custom function dplyr mutate and paste
We may use the arguments as unquoted and use {{}}
for evaluation
my_fun <- function(dataf, V1, V2){
dataf %>%
dplyr::mutate("{{V1}}_{{V2}}" := paste0(format({{V1}}, big.mark = ",") ,
'\n(' , format({{V2}}, big.mark = ",") , ')'))
}
-testing
my_fun(df, speed1, n1)
string speed1 speed2 n1 n2 speed1_n1
1 car 7886.962 3218.585 37 83 7,886.962\n(37)
2 train 9534.978 5524.649 98 34 9,534.978\n(98)
3 bike 6984.790 9476.838 60 55 6,984.790\n(60)
4 plain 6543.198 2638.609 9 53 6,543.198\n( 9)
R/dplyr: Mutate based on multiple dynamic variable names
Great question. Below is a base R solution. I am sure it can be adapted to a tidyverse solution (e.g., with purrr::map2()
). Here I built a function that does a basic test and then used it with lapply()
. Note: the answer is tailored for your example, so you'll need to adapt it if you have different column names for the value / units. Hope this helps!!
val_by_unit <- function(data) {
df <- data[order(names(data))]
# Selecting columns for values and units
val <- df[endsWith(names(df), "area")]
unit <- df[endsWith(names(df), "unit")]
# Check names are multiplying correctly
if(!all(names(val) == sub("_unit", "", names(unit)))) {
stop("Not all areas have a corresponding unit")
}
# Multiplying corresponding columns
output <- Map(`*`, val, unit)
# Renaming output and adding columns
data[paste0(names(output), "_ha")] <- output
data
}
Results:
lapply(ab_list, val_by_unit)
$a
# A tibble: 3 x 7
a1_area a2_area_unit a2_area a1_area_unit abc a1_area_ha a2_area_ha
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 1 1 1
2 1 1 1 0.5 2 0.5 1
3 1 0.5 1 0.5 3 0.5 0.5
$b
# A tibble: 3 x 7
b1_area b1_area_unit b2_area b2_area_unit abc b1_area_ha b2_area_ha
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 1 1 1
2 1 1 1 0.5 2 1 0.5
3 1 0.5 1 0.5 3 0.5 0.5
Related Topics
Automated Formula Construction
How to Use Tidyr to Fill in Completed Rows Within Each Value of a Grouping Variable
Download Plotly Using Downloadhandler
Group Vector on Conditional Sum
Text Mining R Package & Regex to Handle Replace Smart Curly Quotes
How to Read a Text File into Gnu R with a Multiple-Byte Separator
Filling in a New Column Based on a Condition in a Data Frame
Install.Packages R on Ubuntu 12.04 Downloads But Does Not Install Packages
Reading and Scanning Ms Word .Doc Files in R
Fixing a Multiple Warning "Unknown Column"
How to Split Data Frame by Column Names in R
Contrasts Can Be Applied Only to Factor
In R, How to Plot into a Memory Buffer Instead of a File
Using Grep to Subset Rows from a Data.Table, Comparing Row Content
Date-Time Differences Between Rows in R
Sample Function Gives Different Result in Console and in Knitted Document When Seed Is Set