Dplyr write a function with column names as inputs
Is this what you expected?
df<-tbl_df(data.frame(group=rep(c("A", "B"), each=3), var1=sample(1:100, 6), var2=sample(1:100, 6)))
example<-function(colname){
df %>%
group_by(group)%>%
summarize(output=mean(sqrt(colname)))%>%
select(output)
}
example( quote(var1) )
#-----
Source: local data frame [2 x 1]
output
1 7.185935
2 8.090866
dplyr - using column names as function arguments
This can work using the latest dplyr
syntax (as can be seen on github):
library(dplyr)
library(rlang)
sumByColumn <- function(df, colName) {
df %>%
group_by(a) %>%
summarize(tot = sum(!! sym(colName)))
}
sumByColumn(data, "b")
## A tibble: 2 x 2
# a tot
# <int> <int>
#1 1 24
#2 2 27
And an alternative way of specifying b
as a variable:
library(dplyr)
sumByColumn <- function(df, colName) {
myenc <- enquo(colName)
df %>%
group_by(a) %>%
summarize(tot = sum(!!myenc))
}
sumByColumn(data, b)
## A tibble: 2 x 2
# a tot
# <int> <int>
#1 1 24
#2 2 27
Write a function with default column name inputs in dplyr::mutate()
We can write a function called round_x()
that wraps around mutate()
and has age
as a default argument:
library(dplyr)
round_x <- function(.data, x = age) {
x <- enquo(x)
var_name <- paste0("round_", quo_name(x))
mutate(.data, !!var_name := round(!!x))
}
If we call this function with no arguments:
data %>% round_x()
# age round_age
#1 50.1 50
#2 60.5 60
We could pass other arguments if we wanted to:
data.frame(data, weight = c(180.5, 200.6)) %>% round_x(weight)
# age weight round_weight
#1 50.1 180.5 180
#2 60.5 200.6 201
Include column names as function input with dplyr
I've slightly updated your code to dplyr 1.0.0
and tidyr
. Then you can make use of the new dplyr
programming feature {{}}
to specify variables that are arguments of a function.
# Example data frame
df <- data.frame("ID" = rep(1:5, each = 4), "score" = runif(20, 0, 100), "location" = rep(c("a", "b", "c", "d"), 5))
library(dplyr)
wide_fun <- function(.data, key_name, value_name) {
.data %>%
group_by(across(-{{value_name}})) %>% # group by everything other than the value column.
mutate(row_id = 1:n()) %>% ungroup() %>% # build group index
tidyr::pivot_wider(
names_from = {{key_name}},
values_from = {{value_name}}) %>% # spread
select(-row_id)
}
wide_fun(df, location, score)
#> # A tibble: 5 x 5
#> ID a b c d
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 90.8 38.9 28.7 39.0
#> 2 2 94.5 24.9 84.6 54.6
#> 3 3 61.1 97.2 12.2 57.7
#> 4 4 52.7 85.6 41.4 100.
#> 5 5 17.8 86.1 92.3 33.7
Created on 2020-09-11 by the reprex package (v0.3.0)
Edit
This function should also work with older versions of dplyr
:
library(dplyr)
wide_fun_2 <- function(.data, key_name, value_name) {
.data %>%
group_by_at(vars(-!!ensym(value_name))) %>% # group by everything other than the value column.
mutate(row_id = 1:n()) %>% ungroup() %>% # build group index
tidyr::pivot_wider(
names_from = !!ensym(key_name),
values_from = !!ensym(value_name)) %>% # spread
select(-row_id)
}
df %>%
wide_fun_2(location, score)
A tibble: 5 x 5
ID a b c d
<int> <dbl> <dbl> <dbl> <dbl>
1 1 72.2 81.4 52.5 48.8
2 2 36.1 27.5 82.2 73.0
3 3 83.9 68.2 80.9 15.7
4 4 0.451 70.0 18.5 43.2
5 5 82.6 68.2 22.8 63.0
If you just provide the argument that specifies the column, you only need to deal with symbols and not quosures, therefore you need to use ensym
.
Function with a data frame column name as input in R
To use dplyr
code in function you have to use non-standard evaluation. In this case using {{}}
in the function would do.
library(dplyr)
func = function(df, col) {
df = df %>% mutate({{col}} := {{col}} + 1)
return(df)
}
new_df = func(cars, speed)
head(cars)
# speed dist
#1 4 2
#2 4 10
#3 7 4
#4 7 22
#5 8 16
#6 9 10
head(new_df)
# speed dist
#1 5 2
#2 5 10
#3 8 4
#4 8 22
#5 9 16
#6 10 10
You can read more about non-standard evaluation here https://dplyr.tidyverse.org/articles/programming.html
How do I write a dplyr pipe-friendly function where a new column name is provided from a function argument?
In this case you can just stick to using the embrace {{}}
option for your variables. If you want to dynamically create column names, you're going to still need to use :=
. The difference here is that you can use the glue-style syntax with the embrace operator to get the name of the symbol. This works with the data provided.
elective_open <- function(.data, name_for_elective, course, tiebreaker){
.data%>%
mutate("{{name_for_elective}}" := ifelse({{tiebreaker}}==max({{tiebreaker}}),1,0)) %>%
mutate("{{name_for_elective}}" := ifelse({{name_for_elective}}==0,{{course}}[{{name_for_elective}}==1],"")) %>%
filter(!({{course}} %in% {{name_for_elective}}))
}
How to pass column names into a function dplyr
We can use the new quosures from the devel version of dplyr
(soon to be released in 0.6.0)
summarise_data_categorical <- function(var1, t_var, dt){
var1 <- enquo(var1)
t_var <- enquo(t_var)
v1 <- quo_name(var1)
v2 <- quo_name(t_var)
dt %>%
select(one_of(v1, v2)) %>%
group_by(!!t_var, !!var1) %>%
summarise(count = n())
}
summarise_data_categorical(lets, quartertype, fr)
#Source: local data frame [65 x 3]
#Groups: quartertype [?]
# quartertype lets count
# <int> <fctr> <int>
#1 1 A 1
#2 1 F 2
#3 1 G 2
#4 1 H 1
#5 1 I 1
#6 1 J 4
#7 1 M 3
#8 1 N 1
#9 1 P 1
#10 1 S 5
# ... with 55 more rows
The enquo
does a similar functionality as substitute
from base R
by taking the input arguments and convert it to quosures
. The one_of
takes a string argument, so quosures can be converted to string with quo_name
. Inside the group_by/summarise/mutate
etc, we can evaluate the quosure by unquote (UQ
or !!
)
The quosures
seems to be working fine with dplyr
though we have some difficulty in implementing the same with tidyr
functions. The following code should work for the full code
summarise_data_categorical <- function(var1, t_var, dt){
var1 <- enquo(var1)
t_var <- enquo(t_var)
v1 <- quo_name(var1)
v2 <- quo_name(t_var)
Summ_func <- dt %>%
select(one_of(v1, v2)) %>%
group_by(!!t_var, !!var1) %>%
summarise(count = n())
count_table <- Summ_func %>%
spread_(v2, "count")
freq <- Summ_func %>%
mutate(freq = round(count / sum(count),3)*100) %>%
select(-count)
freq_table <- freq %>%
spread_(v2, "freq")
freq_chart <- freq %>%
ggplot()+
geom_line(mapping=aes_string(x= v2 , y = "freq", colour= v1))
results <- list(count_table, freq_table, freq_chart)
results
}
summarise_data_categorical(lets, quartertype, fr)
#[[1]]
# A tibble: 24 × 5
# lets `1` `2` `3` `4`
#* <fctr> <int> <int> <int> <int>
#1 A NA NA 1 2
#2 B 2 NA NA 1
#3 C 1 5 1 2
#4 E 1 1 NA NA
#5 G NA 1 2 2
#6 H 1 NA 1 1
#7 I NA 1 1 2
#8 J 2 1 1 1
#9 K 1 1 2 1
#10 L NA 2 NA NA
# ... with 14 more rows
#[[2]]
# A tibble: 24 × 5
# lets `1` `2` `3` `4`
#* <fctr> <dbl> <dbl> <dbl> <dbl>
#1 A NA NA 3.1 9.5
#2 B 8.7 NA NA 4.8
#3 C 4.3 20.8 3.1 9.5
#4 E 4.3 4.2 NA NA
#5 G NA 4.2 6.2 9.5
#6 H 4.3 NA 3.1 4.8
#7 I NA 4.2 3.1 9.5
#8 J 8.7 4.2 3.1 4.8
#9 K 4.3 4.2 6.2 4.8
#10 L NA 8.3 NA NA
## ... with 14 more rows
#[[3]]
How to pass dynamic column names in dplyr into custom function?
Using the latest version of dplyr (>=0.7), you can use the rlang
!!
(bang-bang) operator.
library(tidyverse)
from <- "Stand1971"
to <- "Stand1987"
data %>%
mutate(diff=(!!as.name(from))-(!!as.name(to)))
You just need to convert the strings to names with as.name
and then insert them into the expression. Unfortunately I seem to have to use a few more parenthesis than I would like, but the !!
operator seems to fall in a weird order-of-operations order.
Original answer, dplyr (0.3-<0.7):
From that vignette (vignette("nse","dplyr")
), use lazyeval's interp()
function
library(lazyeval)
from <- "Stand1971"
to <- "Stand1987"
data %>%
mutate_(diff=interp(~from - to, from=as.name(from), to=as.name(to)))
Related Topics
Using Tidy Eval for Multiple Dplyr Filter Conditions
Change Values in Row Based on a Column Value R
Stargazer Output Appears Below Text - Rmarkdown to PDF
Error When Mapping in Ggmap with API Key (403 Forbidden)
Interleave Columns of Two Data Frames
Bar Plot for Count Data by Group in R
Reshape R Data with User Entries in Rows, Collapsing for Each User
Click on Cross Domain Iframe Element Using Rselenium
Populate Nas in a Vector Using Prior Non-Na Values
Adding Manual Legend in Ggplot
Get First Entries in Rows of List
Ggplot2: How to Rotate a Graph in a Specific Angle
Combining .Sd with Renamed Variable Messes with Names of .Sd Columns
Dependent Inputs in Shiny Application with R
R Geom_Tile Ggplot2 What Kind of Stat Is Applied
Highlight a Single "Bar" in Ggplot
R: Miscellaneous Errors While Trying to Plot Graphs
How to Subset Column Variables in Df1 Based on the Important Variables I Got in Df2