Dplyr Write a Function with Column Names as Inputs

Dplyr write a function with column names as inputs

Is this what you expected?

df<-tbl_df(data.frame(group=rep(c("A", "B"), each=3), var1=sample(1:100, 6), var2=sample(1:100, 6)))

example<-function(colname){
df %>%
group_by(group)%>%
summarize(output=mean(sqrt(colname)))%>%
select(output)
}
example( quote(var1) )
#-----
Source: local data frame [2 x 1]

output
1 7.185935
2 8.090866

dplyr - using column names as function arguments

This can work using the latest dplyr syntax (as can be seen on github):

library(dplyr)
library(rlang)
sumByColumn <- function(df, colName) {
df %>%
group_by(a) %>%
summarize(tot = sum(!! sym(colName)))
}

sumByColumn(data, "b")
## A tibble: 2 x 2
# a tot
# <int> <int>
#1 1 24
#2 2 27

And an alternative way of specifying b as a variable:

library(dplyr)
sumByColumn <- function(df, colName) {
myenc <- enquo(colName)
df %>%
group_by(a) %>%
summarize(tot = sum(!!myenc))
}

sumByColumn(data, b)
## A tibble: 2 x 2
# a tot
# <int> <int>
#1 1 24
#2 2 27

Write a function with default column name inputs in dplyr::mutate()

We can write a function called round_x() that wraps around mutate() and has age as a default argument:

library(dplyr)

round_x <- function(.data, x = age) {
x <- enquo(x)
var_name <- paste0("round_", quo_name(x))
mutate(.data, !!var_name := round(!!x))
}

If we call this function with no arguments:

data %>% round_x()
# age round_age
#1 50.1 50
#2 60.5 60

We could pass other arguments if we wanted to:

data.frame(data, weight = c(180.5, 200.6)) %>% round_x(weight)
# age weight round_weight
#1 50.1 180.5 180
#2 60.5 200.6 201

Include column names as function input with dplyr

I've slightly updated your code to dplyr 1.0.0 and tidyr. Then you can make use of the new dplyr programming feature {{}} to specify variables that are arguments of a function.

# Example data frame
df <- data.frame("ID" = rep(1:5, each = 4), "score" = runif(20, 0, 100), "location" = rep(c("a", "b", "c", "d"), 5))
library(dplyr)
wide_fun <- function(.data, key_name, value_name) {
.data %>%
group_by(across(-{{value_name}})) %>% # group by everything other than the value column.
mutate(row_id = 1:n()) %>% ungroup() %>% # build group index
tidyr::pivot_wider(
names_from = {{key_name}},
values_from = {{value_name}}) %>% # spread
select(-row_id)
}

wide_fun(df, location, score)
#> # A tibble: 5 x 5
#> ID a b c d
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 90.8 38.9 28.7 39.0
#> 2 2 94.5 24.9 84.6 54.6
#> 3 3 61.1 97.2 12.2 57.7
#> 4 4 52.7 85.6 41.4 100.
#> 5 5 17.8 86.1 92.3 33.7

Created on 2020-09-11 by the reprex package (v0.3.0)

Edit

This function should also work with older versions of dplyr:

library(dplyr)
wide_fun_2 <- function(.data, key_name, value_name) {
.data %>%
group_by_at(vars(-!!ensym(value_name))) %>% # group by everything other than the value column.
mutate(row_id = 1:n()) %>% ungroup() %>% # build group index
tidyr::pivot_wider(
names_from = !!ensym(key_name),
values_from = !!ensym(value_name)) %>% # spread
select(-row_id)
}

df %>%
wide_fun_2(location, score)
A tibble: 5 x 5
ID a b c d
<int> <dbl> <dbl> <dbl> <dbl>
1 1 72.2 81.4 52.5 48.8
2 2 36.1 27.5 82.2 73.0
3 3 83.9 68.2 80.9 15.7
4 4 0.451 70.0 18.5 43.2
5 5 82.6 68.2 22.8 63.0

If you just provide the argument that specifies the column, you only need to deal with symbols and not quosures, therefore you need to use ensym.

Function with a data frame column name as input in R

To use dplyr code in function you have to use non-standard evaluation. In this case using {{}} in the function would do.

library(dplyr)

func = function(df, col) {

df = df %>% mutate({{col}} := {{col}} + 1)
return(df)
}
new_df = func(cars, speed)
head(cars)

# speed dist
#1 4 2
#2 4 10
#3 7 4
#4 7 22
#5 8 16
#6 9 10

head(new_df)

# speed dist
#1 5 2
#2 5 10
#3 8 4
#4 8 22
#5 9 16
#6 10 10

You can read more about non-standard evaluation here https://dplyr.tidyverse.org/articles/programming.html

How do I write a dplyr pipe-friendly function where a new column name is provided from a function argument?

In this case you can just stick to using the embrace {{}} option for your variables. If you want to dynamically create column names, you're going to still need to use :=. The difference here is that you can use the glue-style syntax with the embrace operator to get the name of the symbol. This works with the data provided.

elective_open <- function(.data, name_for_elective, course, tiebreaker){ 
.data%>%
mutate("{{name_for_elective}}" := ifelse({{tiebreaker}}==max({{tiebreaker}}),1,0)) %>%
mutate("{{name_for_elective}}" := ifelse({{name_for_elective}}==0,{{course}}[{{name_for_elective}}==1],"")) %>%
filter(!({{course}} %in% {{name_for_elective}}))
}

How to pass column names into a function dplyr

We can use the new quosures from the devel version of dplyr (soon to be released in 0.6.0)

summarise_data_categorical <- function(var1, t_var, dt){

var1 <- enquo(var1)
t_var <- enquo(t_var)
v1 <- quo_name(var1)
v2 <- quo_name(t_var)

dt %>%
select(one_of(v1, v2)) %>%
group_by(!!t_var, !!var1) %>%
summarise(count = n())

}
summarise_data_categorical(lets, quartertype, fr)
#Source: local data frame [65 x 3]
#Groups: quartertype [?]

# quartertype lets count
# <int> <fctr> <int>
#1 1 A 1
#2 1 F 2
#3 1 G 2
#4 1 H 1
#5 1 I 1
#6 1 J 4
#7 1 M 3
#8 1 N 1
#9 1 P 1
#10 1 S 5
# ... with 55 more rows

The enquo does a similar functionality as substitute from base R by taking the input arguments and convert it to quosures. The one_of takes a string argument, so quosures can be converted to string with quo_name. Inside the group_by/summarise/mutate etc, we can evaluate the quosure by unquote (UQ or !!)


The quosures seems to be working fine with dplyr though we have some difficulty in implementing the same with tidyr functions. The following code should work for the full code

 summarise_data_categorical <- function(var1, t_var, dt){

var1 <- enquo(var1)
t_var <- enquo(t_var)

v1 <- quo_name(var1)
v2 <- quo_name(t_var)

Summ_func <- dt %>%
select(one_of(v1, v2)) %>%
group_by(!!t_var, !!var1) %>%
summarise(count = n())

count_table <- Summ_func %>%
spread_(v2, "count")

freq <- Summ_func %>%
mutate(freq = round(count / sum(count),3)*100) %>%
select(-count)

freq_table <- freq %>%
spread_(v2, "freq")

freq_chart <- freq %>%
ggplot()+
geom_line(mapping=aes_string(x= v2 , y = "freq", colour= v1))

results <- list(count_table, freq_table, freq_chart)
results

}
summarise_data_categorical(lets, quartertype, fr)
#[[1]]
# A tibble: 24 × 5
# lets `1` `2` `3` `4`
#* <fctr> <int> <int> <int> <int>
#1 A NA NA 1 2
#2 B 2 NA NA 1
#3 C 1 5 1 2
#4 E 1 1 NA NA
#5 G NA 1 2 2
#6 H 1 NA 1 1
#7 I NA 1 1 2
#8 J 2 1 1 1
#9 K 1 1 2 1
#10 L NA 2 NA NA
# ... with 14 more rows

#[[2]]
# A tibble: 24 × 5
# lets `1` `2` `3` `4`
#* <fctr> <dbl> <dbl> <dbl> <dbl>
#1 A NA NA 3.1 9.5
#2 B 8.7 NA NA 4.8
#3 C 4.3 20.8 3.1 9.5
#4 E 4.3 4.2 NA NA
#5 G NA 4.2 6.2 9.5
#6 H 4.3 NA 3.1 4.8
#7 I NA 4.2 3.1 9.5
#8 J 8.7 4.2 3.1 4.8
#9 K 4.3 4.2 6.2 4.8
#10 L NA 8.3 NA NA
## ... with 14 more rows

#[[3]]

Sample Image

How to pass dynamic column names in dplyr into custom function?

Using the latest version of dplyr (>=0.7), you can use the rlang !! (bang-bang) operator.

library(tidyverse)
from <- "Stand1971"
to <- "Stand1987"

data %>%
mutate(diff=(!!as.name(from))-(!!as.name(to)))

You just need to convert the strings to names with as.name and then insert them into the expression. Unfortunately I seem to have to use a few more parenthesis than I would like, but the !! operator seems to fall in a weird order-of-operations order.

Original answer, dplyr (0.3-<0.7):

From that vignette (vignette("nse","dplyr")), use lazyeval's interp() function

library(lazyeval)

from <- "Stand1971"
to <- "Stand1987"

data %>%
mutate_(diff=interp(~from - to, from=as.name(from), to=as.name(to)))


Related Topics



Leave a reply



Submit