How to Pass Dynamic Column Names in Dplyr into Custom Function

How to pass dynamic column names in dplyr into custom function?

Using the latest version of dplyr (>=0.7), you can use the rlang !! (bang-bang) operator.

library(tidyverse)
from <- "Stand1971"
to <- "Stand1987"

data %>%
mutate(diff=(!!as.name(from))-(!!as.name(to)))

You just need to convert the strings to names with as.name and then insert them into the expression. Unfortunately I seem to have to use a few more parenthesis than I would like, but the !! operator seems to fall in a weird order-of-operations order.

Original answer, dplyr (0.3-<0.7):

From that vignette (vignette("nse","dplyr")), use lazyeval's interp() function

library(lazyeval)

from <- "Stand1971"
to <- "Stand1987"

data %>%
mutate_(diff=interp(~from - to, from=as.name(from), to=as.name(to)))

Use dynamic name for new column/variable in `dplyr`

Since you are dynamically building a variable name as a character value, it makes more sense to do assignment using standard data.frame indexing which allows for character values for column names. For example:

multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df[[varname]] <- with(df, Petal.Width * n)
df
}

The mutate function makes it very easy to name new columns via named parameters. But that assumes you know the name when you type the command. If you want to dynamically specify the column name, then you need to also build the named argument.



dplyr version >= 1.0

With the latest dplyr version you can use the syntax from the glue package when naming parameters when using :=. So here the {} in the name grab the value by evaluating the expression inside.

multipetal <- function(df, n) {
mutate(df, "petal.{n}" := Petal.Width * n)
}

If you are passing a column name to your function, you can use {{}} in the string as well as for the column name

meanofcol <- function(df, col) {
mutate(df, "Mean of {{col}}" := mean({{col}}))
}
meanofcol(iris, Petal.Width)



dplyr version >= 0.7

dplyr starting with version 0.7 allows you to use := to dynamically assign parameter names. You can write your function as:

# --- dplyr version 0.7+---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
mutate(df, !!varname := Petal.Width * n)
}

For more information, see the documentation available form vignette("programming", "dplyr").



dplyr (>=0.3 & <0.7)

Slightly earlier version of dplyr (>=0.3 <0.7), encouraged the use of "standard evaluation" alternatives to many of the functions. See the Non-standard evaluation vignette for more information (vignette("nse")).

So here, the answer is to use mutate_() rather than mutate() and do:

# --- dplyr version 0.3-0.5---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
varval <- lazyeval::interp(~Petal.Width * n, n=n)
mutate_(df, .dots= setNames(list(varval), varname))
}


dplyr < 0.3

Note this is also possible in older versions of dplyr that existed when the question was originally posed. It requires careful use of quote and setName:

# --- dplyr versions < 0.3 ---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
pp <- c(quote(df), setNames(list(quote(Petal.Width * n)), varname))
do.call("mutate", pp)
}

Pass column names to dplyr::coalesce() when writing a custom function

This a simple implementation that will only return the select columns, but could fairly easily extended to keep all columns (I'd bind_cols them back on at the end...).

It's simple because we rely on select to do the work for us, as suggested at the start of the Implementing tidyselect vignette

# edited to keep all columns
coalesce_df = function(data, ...) {
data %>%
select(...) %>%
transmute(result = invoke(coalesce, .)) %>%
bind_cols(data, .)
}



df %>%
coalesce_df(everything())
# col_a col_b col_c result
# 1 bob <NA> paul bob
# 2 <NA> danny <NA> danny
# 3 bob <NA> <NA> bob
# 4 <NA> <NA> paul paul
# 5 bob <NA> <NA> bob

df %>% coalesce_df(col_a, col_b)
# col_a col_b col_c result
# 1 bob <NA> paul bob
# 2 <NA> danny <NA> danny
# 3 bob <NA> <NA> bob
# 4 <NA> <NA> paul <NA>
# 5 bob <NA> <NA> bob

Pass character string of column names (e.g. c(speed, dist) to `across` function in R

You can't use substitute() or eval() on character vectors. You need to parse those character vectors into language objects. Otherwise when you eval a string, you just get that string back. It's not like eval in other languages. One way to do the parsing is str2lang. Then you can inject that expression into the across using tidy evaulation's !!. For example

mtcars_2 %>% 
mutate(across(.cols = !!str2lang(.$cols_to_modify),.fns = round))

How to pass column names into a function dplyr

We can use the new quosures from the devel version of dplyr (soon to be released in 0.6.0)

summarise_data_categorical <- function(var1, t_var, dt){

var1 <- enquo(var1)
t_var <- enquo(t_var)
v1 <- quo_name(var1)
v2 <- quo_name(t_var)

dt %>%
select(one_of(v1, v2)) %>%
group_by(!!t_var, !!var1) %>%
summarise(count = n())

}
summarise_data_categorical(lets, quartertype, fr)
#Source: local data frame [65 x 3]
#Groups: quartertype [?]

# quartertype lets count
# <int> <fctr> <int>
#1 1 A 1
#2 1 F 2
#3 1 G 2
#4 1 H 1
#5 1 I 1
#6 1 J 4
#7 1 M 3
#8 1 N 1
#9 1 P 1
#10 1 S 5
# ... with 55 more rows

The enquo does a similar functionality as substitute from base R by taking the input arguments and convert it to quosures. The one_of takes a string argument, so quosures can be converted to string with quo_name. Inside the group_by/summarise/mutate etc, we can evaluate the quosure by unquote (UQ or !!)


The quosures seems to be working fine with dplyr though we have some difficulty in implementing the same with tidyr functions. The following code should work for the full code

 summarise_data_categorical <- function(var1, t_var, dt){

var1 <- enquo(var1)
t_var <- enquo(t_var)

v1 <- quo_name(var1)
v2 <- quo_name(t_var)

Summ_func <- dt %>%
select(one_of(v1, v2)) %>%
group_by(!!t_var, !!var1) %>%
summarise(count = n())

count_table <- Summ_func %>%
spread_(v2, "count")

freq <- Summ_func %>%
mutate(freq = round(count / sum(count),3)*100) %>%
select(-count)

freq_table <- freq %>%
spread_(v2, "freq")

freq_chart <- freq %>%
ggplot()+
geom_line(mapping=aes_string(x= v2 , y = "freq", colour= v1))

results <- list(count_table, freq_table, freq_chart)
results

}
summarise_data_categorical(lets, quartertype, fr)
#[[1]]
# A tibble: 24 × 5
# lets `1` `2` `3` `4`
#* <fctr> <int> <int> <int> <int>
#1 A NA NA 1 2
#2 B 2 NA NA 1
#3 C 1 5 1 2
#4 E 1 1 NA NA
#5 G NA 1 2 2
#6 H 1 NA 1 1
#7 I NA 1 1 2
#8 J 2 1 1 1
#9 K 1 1 2 1
#10 L NA 2 NA NA
# ... with 14 more rows

#[[2]]
# A tibble: 24 × 5
# lets `1` `2` `3` `4`
#* <fctr> <dbl> <dbl> <dbl> <dbl>
#1 A NA NA 3.1 9.5
#2 B 8.7 NA NA 4.8
#3 C 4.3 20.8 3.1 9.5
#4 E 4.3 4.2 NA NA
#5 G NA 4.2 6.2 9.5
#6 H 4.3 NA 3.1 4.8
#7 I NA 4.2 3.1 9.5
#8 J 8.7 4.2 3.1 4.8
#9 K 4.3 4.2 6.2 4.8
#10 L NA 8.3 NA NA
## ... with 14 more rows

#[[3]]

Sample Image

In R, dplyr mutate referencing column names by string

We can convert to symbol and evaluate with !!

library(dplyr)
mydf %>%
mutate(newCol = !! rlang::sym(var1) + !! rlang::sym(var2))

Or another option is subset the column with .data

mydf %>%
mutate(newCol = .data[[var1]] + .data[[var2]])

or may use rowSums

mydf %>% 
mutate(newCol = rowSums(select(cur_data(), all_of(c(var1, var2)))))

Pass variable as column name to dplyr?

We can do this with data.table. We convert the 'data.frame' to 'data.table' (setDT(df)), grouped by the the row sequence, we get the value of the paste output, and assign (:=) it to a new column ('my.p').

library(data.table)
setDT(df)[, my.p:= get(paste0(max1, '.p')), 1:nrow(df)]
df
# col1 col1.p col2 col2.p col3 col3.p max1 my.p
# 1: a 1 a 6 c 11 col3 11
# 2: b 2 c 7 d 12 col2 7
# 3: c 3 l 8 e 13 col1 3
# 4: d 4 c 9 f 14 col2 9
# 5: c 5 l 10 g 15 col1 5
# 6: a 1 a 6 c 16 col3 16
# 7: b 2 c 7 d 17 col2 7
# 8: c 3 l 8 e 18 col1 3
# 9: d 4 c 9 f 19 col2 9
#10: c 5 l 10 g 20 col1 5

How to pass column name as argument to function for dplyr verbs?

Here is another way of making it work. You can use .data[[var]] construct for a column name which is stored as a string:

foo <- function(data, colName) {

result <- data %>%
group_by(.data[[colName]]) %>%
summarise(count = n())

return(result)
}

foo(quakes, "stations")

# A tibble: 102 x 2
stations count
<int> <int>
1 10 20
2 11 28
3 12 25
4 13 21
5 14 39
6 15 34
7 16 35
8 17 38
9 18 33
10 19 29
# ... with 92 more rows

In case you decide not to pass the ColName as a string you you wrap it with a pair of curly braces inside your function to get the similar result:

foo <- function(data, colName) {

result <- data %>%
group_by({{ colName }}) %>%
summarise(count = n())

return(result)
}

foo(quakes, stations)

# A tibble: 102 x 2
stations count
<int> <int>
1 10 20
2 11 28
3 12 25
4 13 21
5 14 39
6 15 34
7 16 35
8 17 38
9 18 33
10 19 29
# ... with 92 more rows

How to pass a column name into a custom function, which uses dplyr?

Thanks, @ycw, that was a nice read. Got it working now after working through the article. This was the solution at the end of the day:

substrColName <- function(df, colName, start) {
colNameQuo <- enquo(colName)
df %>% mutate(!!quo_name(colNameQuo) := substr(!!colNameQuo, start=start,stop=nchar(!!colNameQuo)))
}

And this is ycw's comment.



Related Topics



Leave a reply



Submit