How to pass dynamic column names in dplyr into custom function?
Using the latest version of dplyr (>=0.7), you can use the rlang
!!
(bang-bang) operator.
library(tidyverse)
from <- "Stand1971"
to <- "Stand1987"
data %>%
mutate(diff=(!!as.name(from))-(!!as.name(to)))
You just need to convert the strings to names with as.name
and then insert them into the expression. Unfortunately I seem to have to use a few more parenthesis than I would like, but the !!
operator seems to fall in a weird order-of-operations order.
Original answer, dplyr (0.3-<0.7):
From that vignette (vignette("nse","dplyr")
), use lazyeval's interp()
function
library(lazyeval)
from <- "Stand1971"
to <- "Stand1987"
data %>%
mutate_(diff=interp(~from - to, from=as.name(from), to=as.name(to)))
Use dynamic name for new column/variable in `dplyr`
Since you are dynamically building a variable name as a character value, it makes more sense to do assignment using standard data.frame indexing which allows for character values for column names. For example:
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df[[varname]] <- with(df, Petal.Width * n)
df
}
The mutate
function makes it very easy to name new columns via named parameters. But that assumes you know the name when you type the command. If you want to dynamically specify the column name, then you need to also build the named argument.
dplyr version >= 1.0
With the latest dplyr version you can use the syntax from the glue
package when naming parameters when using :=
. So here the {}
in the name grab the value by evaluating the expression inside.
multipetal <- function(df, n) {
mutate(df, "petal.{n}" := Petal.Width * n)
}
If you are passing a column name to your function, you can use {{}}
in the string as well as for the column name
meanofcol <- function(df, col) {
mutate(df, "Mean of {{col}}" := mean({{col}}))
}
meanofcol(iris, Petal.Width)
dplyr version >= 0.7
dplyr
starting with version 0.7 allows you to use :=
to dynamically assign parameter names. You can write your function as:
# --- dplyr version 0.7+---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
mutate(df, !!varname := Petal.Width * n)
}
For more information, see the documentation available form vignette("programming", "dplyr")
.
dplyr (>=0.3 & <0.7)
Slightly earlier version of dplyr
(>=0.3 <0.7), encouraged the use of "standard evaluation" alternatives to many of the functions. See the Non-standard evaluation vignette for more information (vignette("nse")
).
So here, the answer is to use mutate_()
rather than mutate()
and do:
# --- dplyr version 0.3-0.5---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
varval <- lazyeval::interp(~Petal.Width * n, n=n)
mutate_(df, .dots= setNames(list(varval), varname))
}
dplyr < 0.3
Note this is also possible in older versions of dplyr
that existed when the question was originally posed. It requires careful use of quote
and setName
:
# --- dplyr versions < 0.3 ---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
pp <- c(quote(df), setNames(list(quote(Petal.Width * n)), varname))
do.call("mutate", pp)
}
Pass column names to dplyr::coalesce() when writing a custom function
This a simple implementation that will only return the select columns, but could fairly easily extended to keep all columns (I'd bind_cols
them back on at the end...).
It's simple because we rely on select
to do the work for us, as suggested at the start of the Implementing tidyselect vignette
# edited to keep all columns
coalesce_df = function(data, ...) {
data %>%
select(...) %>%
transmute(result = invoke(coalesce, .)) %>%
bind_cols(data, .)
}
df %>%
coalesce_df(everything())
# col_a col_b col_c result
# 1 bob <NA> paul bob
# 2 <NA> danny <NA> danny
# 3 bob <NA> <NA> bob
# 4 <NA> <NA> paul paul
# 5 bob <NA> <NA> bob
df %>% coalesce_df(col_a, col_b)
# col_a col_b col_c result
# 1 bob <NA> paul bob
# 2 <NA> danny <NA> danny
# 3 bob <NA> <NA> bob
# 4 <NA> <NA> paul <NA>
# 5 bob <NA> <NA> bob
Pass character string of column names (e.g. c(speed, dist) to `across` function in R
You can't use substitute()
or eval()
on character vectors. You need to parse those character vectors into language objects. Otherwise when you eval a string, you just get that string back. It's not like eval
in other languages. One way to do the parsing is str2lang
. Then you can inject that expression into the across
using tidy evaulation's !!
. For example
mtcars_2 %>%
mutate(across(.cols = !!str2lang(.$cols_to_modify),.fns = round))
How to pass column names into a function dplyr
We can use the new quosures from the devel version of dplyr
(soon to be released in 0.6.0)
summarise_data_categorical <- function(var1, t_var, dt){
var1 <- enquo(var1)
t_var <- enquo(t_var)
v1 <- quo_name(var1)
v2 <- quo_name(t_var)
dt %>%
select(one_of(v1, v2)) %>%
group_by(!!t_var, !!var1) %>%
summarise(count = n())
}
summarise_data_categorical(lets, quartertype, fr)
#Source: local data frame [65 x 3]
#Groups: quartertype [?]
# quartertype lets count
# <int> <fctr> <int>
#1 1 A 1
#2 1 F 2
#3 1 G 2
#4 1 H 1
#5 1 I 1
#6 1 J 4
#7 1 M 3
#8 1 N 1
#9 1 P 1
#10 1 S 5
# ... with 55 more rows
The enquo
does a similar functionality as substitute
from base R
by taking the input arguments and convert it to quosures
. The one_of
takes a string argument, so quosures can be converted to string with quo_name
. Inside the group_by/summarise/mutate
etc, we can evaluate the quosure by unquote (UQ
or !!
)
The quosures
seems to be working fine with dplyr
though we have some difficulty in implementing the same with tidyr
functions. The following code should work for the full code
summarise_data_categorical <- function(var1, t_var, dt){
var1 <- enquo(var1)
t_var <- enquo(t_var)
v1 <- quo_name(var1)
v2 <- quo_name(t_var)
Summ_func <- dt %>%
select(one_of(v1, v2)) %>%
group_by(!!t_var, !!var1) %>%
summarise(count = n())
count_table <- Summ_func %>%
spread_(v2, "count")
freq <- Summ_func %>%
mutate(freq = round(count / sum(count),3)*100) %>%
select(-count)
freq_table <- freq %>%
spread_(v2, "freq")
freq_chart <- freq %>%
ggplot()+
geom_line(mapping=aes_string(x= v2 , y = "freq", colour= v1))
results <- list(count_table, freq_table, freq_chart)
results
}
summarise_data_categorical(lets, quartertype, fr)
#[[1]]
# A tibble: 24 × 5
# lets `1` `2` `3` `4`
#* <fctr> <int> <int> <int> <int>
#1 A NA NA 1 2
#2 B 2 NA NA 1
#3 C 1 5 1 2
#4 E 1 1 NA NA
#5 G NA 1 2 2
#6 H 1 NA 1 1
#7 I NA 1 1 2
#8 J 2 1 1 1
#9 K 1 1 2 1
#10 L NA 2 NA NA
# ... with 14 more rows
#[[2]]
# A tibble: 24 × 5
# lets `1` `2` `3` `4`
#* <fctr> <dbl> <dbl> <dbl> <dbl>
#1 A NA NA 3.1 9.5
#2 B 8.7 NA NA 4.8
#3 C 4.3 20.8 3.1 9.5
#4 E 4.3 4.2 NA NA
#5 G NA 4.2 6.2 9.5
#6 H 4.3 NA 3.1 4.8
#7 I NA 4.2 3.1 9.5
#8 J 8.7 4.2 3.1 4.8
#9 K 4.3 4.2 6.2 4.8
#10 L NA 8.3 NA NA
## ... with 14 more rows
#[[3]]
In R, dplyr mutate referencing column names by string
We can convert to sym
bol and evaluate with !!
library(dplyr)
mydf %>%
mutate(newCol = !! rlang::sym(var1) + !! rlang::sym(var2))
Or another option is subset the column with .data
mydf %>%
mutate(newCol = .data[[var1]] + .data[[var2]])
or may use rowSums
mydf %>%
mutate(newCol = rowSums(select(cur_data(), all_of(c(var1, var2)))))
Pass variable as column name to dplyr?
We can do this with data.table
. We convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by the the row sequence, we get
the value of the paste
output, and assign (:=
) it to a new column ('my.p').
library(data.table)
setDT(df)[, my.p:= get(paste0(max1, '.p')), 1:nrow(df)]
df
# col1 col1.p col2 col2.p col3 col3.p max1 my.p
# 1: a 1 a 6 c 11 col3 11
# 2: b 2 c 7 d 12 col2 7
# 3: c 3 l 8 e 13 col1 3
# 4: d 4 c 9 f 14 col2 9
# 5: c 5 l 10 g 15 col1 5
# 6: a 1 a 6 c 16 col3 16
# 7: b 2 c 7 d 17 col2 7
# 8: c 3 l 8 e 18 col1 3
# 9: d 4 c 9 f 19 col2 9
#10: c 5 l 10 g 20 col1 5
How to pass column name as argument to function for dplyr verbs?
Here is another way of making it work. You can use .data[[var]]
construct for a column name which is stored as a string:
foo <- function(data, colName) {
result <- data %>%
group_by(.data[[colName]]) %>%
summarise(count = n())
return(result)
}
foo(quakes, "stations")
# A tibble: 102 x 2
stations count
<int> <int>
1 10 20
2 11 28
3 12 25
4 13 21
5 14 39
6 15 34
7 16 35
8 17 38
9 18 33
10 19 29
# ... with 92 more rows
In case you decide not to pass the ColName
as a string you you wrap it with a pair of curly braces inside your function to get the similar result:
foo <- function(data, colName) {
result <- data %>%
group_by({{ colName }}) %>%
summarise(count = n())
return(result)
}
foo(quakes, stations)
# A tibble: 102 x 2
stations count
<int> <int>
1 10 20
2 11 28
3 12 25
4 13 21
5 14 39
6 15 34
7 16 35
8 17 38
9 18 33
10 19 29
# ... with 92 more rows
How to pass a column name into a custom function, which uses dplyr?
Thanks, @ycw, that was a nice read. Got it working now after working through the article. This was the solution at the end of the day:
substrColName <- function(df, colName, start) {
colNameQuo <- enquo(colName)
df %>% mutate(!!quo_name(colNameQuo) := substr(!!colNameQuo, start=start,stop=nchar(!!colNameQuo)))
}
And this is ycw's comment.
Related Topics
Counting Number of Instances of a Condition Per Row R
How to Copy and Paste Data into R from the Clipboard
How to Draw a Nice Arrow in Ggplot2
Rcpp Function Check If Missing Value
How to Use Subscripts in Ggplot2 Legends [R]
Calculate Group Mean While Excluding Current Observation Using Dplyr
How to Change the Color in Geom_Point or Lines in Ggplot
Is There an R Function to Reshape This Data from Long to Wide
How to Stack Error Bars in a Stacked Bar Plot Using Geom_Errorbar
Moving Columns Within a Data.Frame() Without Retyping
How to Maintain Size of Ggplot with Long Labels
Difference Between Rbind() and Bind_Rows() in R
Dplyr - Group by and Select Top X %
Rscript: There Is No Package Called ...
Cbind 2 Dataframes with Different Number of Rows
List for Multiple Plots from Loop (Ggplot2) - List Elements Being Overwritten