Dplyr String as Column Reference

dplyr string as column reference

Here's an option that uses interp() from the lazyeval package, which came with your dplyr install. Inside your function(s), you'll need to use the standard evaluation version of the dplyr functions. In this case that would be mutate_().

Note that the new column position will be identical to the Cost column here because of how you've set up the grouping in machines. The second call to my_fun() shows it working on a different set of grouping variables.

library(dplyr)
library(lazyeval)

my_fun <- function(data, col) {
mutate_(data, position = interp(~ cumsum(x), x = as.name(col)))
}

my_fun(machines, "Cost")
# Date Model.Num Cost position
# 1 1/31/2014 123 200 200
# 2 1/31/2014 456 300 300
# 3 2/28/2014 123 250 250
# 4 2/28/2014 456 350 350
# 5 3/31/2014 123 300 300
# 6 3/31/2014 456 400 400

## second example - different grouping
my_fun(group_by(machines, Model.Num), "Cost")
# Date Model.Num Cost position
# 1 1/31/2014 123 200 200
# 2 1/31/2014 456 300 300
# 3 2/28/2014 123 250 450
# 4 2/28/2014 456 350 650
# 5 3/31/2014 123 300 750
# 6 3/31/2014 456 400 1050

In R, dplyr mutate referencing column names by string

We can convert to symbol and evaluate with !!

library(dplyr)
mydf %>%
mutate(newCol = !! rlang::sym(var1) + !! rlang::sym(var2))

Or another option is subset the column with .data

mydf %>%
mutate(newCol = .data[[var1]] + .data[[var2]])

or may use rowSums

mydf %>% 
mutate(newCol = rowSums(select(cur_data(), all_of(c(var1, var2)))))

refer to column name from variable in across in dplyr

Making use of the .data pronoun from rlang you could do:

library(dplyr)

m <- data.frame(x = 1:5, y = 11:15, z = 21:25)
denom <- "z"

m %>% mutate(across(
x:z,
list(~ log(.) - log(.data[[denom]]))
))
#> x y z x_1 y_1 z_1
#> 1 1 11 21 -3.044522 -0.6466272 0
#> 2 2 12 22 -2.397895 -0.6061358 0
#> 3 3 13 23 -2.036882 -0.5705449 0
#> 4 4 14 24 -1.791759 -0.5389965 0
#> 5 5 15 25 -1.609438 -0.5108256 0

Parsing string as column name in dplyr

I would use a named vector instead of trying to mess around with the dplyr programming nuances. A benefit is that this method is already vectorized.

rename_cols <- function(col) {

name = paste0(col, "_new") #I want to be able to parse this into the rename function below

mtcars %>%
rename(setNames(col, name))
}

rename_cols(colnames(mtcars))
# mpg_new cyl_new disp_new hp_new drat_new wt_new qsec_new vs_new am_new gear_new carb_new
# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
# ...


Edit

In this case, you might also find rename_with() to be what you need.

library(dplyr)

colnames(mtcars) -> cols

mtcars %>%
rename_with(~ paste0(., "_new"), any_of(cols))

# which is the same as the more concise but maybe less clear...
mtcars %>%
rename_with(paste0, any_of(cols), "_new")

Pass a string as variable name in dplyr::filter

!! or UQ evaluates the variable, so mtcars %>% filter(!!var == 4) is the same as mtcars %>% filter('cyl' == 4) where the condition always evaluates to false; You can prove this by printing !!var in the filter function:

mtcars %>% filter({ print(!!var); (!!var) == 4 })
# [1] "cyl"
# [1] mpg cyl disp hp drat wt qsec vs am gear carb
# <0 rows> (or 0-length row.names)

To evaluate var to the cyl column, you need to convert var to a symbol of cyl first, then evaluate the symbol cyl to a column:

Using rlang:

library(rlang)
var <- 'cyl'
mtcars %>% filter((!!sym(var)) == 4)

# mpg cyl disp hp drat wt qsec vs am gear carb
#1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#2 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#3 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# ...

Or use as.symbol/as.name from baseR:

mtcars %>% filter((!!as.symbol(var)) == 4)

mtcars %>% filter((!!as.name(var)) == 4)

R dplyr operate on a column known only by its string name

If you have a column name in a string (aka character vector) and you want to use it with tidyeval, then you can covert it with rlang::sym(). Just change

dplyr::filter( mpg > !!rlang::sym(probColName) )

and it should work. This is taken from the recommendation at this github issue: https://github.com/tidyverse/rlang/issues/116

It's still fine to use

dplyr::summarize( !!probColName := quantile(mpg, pctCutoff) )

because when dynamically setting a parameter name, you just need the string and not an unqouted symbol.

Pass character string of column names (e.g. c(speed, dist ) to `across` function in R

You can't use substitute() or eval() on character vectors. You need to parse those character vectors into language objects. Otherwise when you eval a string, you just get that string back. It's not like eval in other languages. One way to do the parsing is str2lang. Then you can inject that expression into the across using tidy evaulation's !!. For example

mtcars_2 %>% 
mutate(across(.cols = !!str2lang(.$cols_to_modify),.fns = round))

Is it possible to name a column of a tibble using a variable containing a character vector (string)?

You can use the following solution:

  • In order to have column names that are stored as string we make use of bang bang operator !! which forces the evaluation of it succeeding name
  • We also need to use walrus := instead of = which are equivalent and prompts you to supply name (as is the case with our variable name) on it LHS (left hand side)
CLADE_FIELD = "Clade"
LINEAGE_FIELD = "Lineage"

metaDF = tibble(!!CLADE_FIELD := c("G"),
!!LINEAGE_FIELD := c("B.666"),
"Submission date" = c("2020-03"))

# A tibble: 1 x 3
Clade Lineage `Submission date`
<chr> <chr> <chr>
1 G B.666 2020-03

Or we can use double braces {{}} as follows:

metaDF = tibble({{CLADE_FIELD}} := c("G"), 
{{LINEAGE_FIELD}} := c("B.666"),
"Submission date" = c("2020-03"))

# A tibble: 1 x 3
Clade Lineage `Submission date`
<chr> <chr> <chr>
1 G B.666 2020-03

Or we can make use of glue syntax and put the variable name within a pair of braces {} and pass the result as a string. Since glue syntax became available on the LHS of := whatever object (here your variable names) you put within a curly braces will be evaluated as R code:

metaDF = tibble("{CLADE_FIELD}" := c("G"), 
"{LINEAGE_FIELD}" := c("B.666"),
"Submission date" = c("2020-03"))

# A tibble: 1 x 3
Clade Lineage `Submission date`
<chr> <chr> <chr>
1 G B.666 2020-03


Related Topics



Leave a reply



Submit