How to Pass Column Name as Argument to Function for Dplyr Verbs

dplyr - using column names as function arguments

This can work using the latest dplyr syntax (as can be seen on github):

library(dplyr)
library(rlang)
sumByColumn <- function(df, colName) {
df %>%
group_by(a) %>%
summarize(tot = sum(!! sym(colName)))
}

sumByColumn(data, "b")
## A tibble: 2 x 2
# a tot
# <int> <int>
#1 1 24
#2 2 27

And an alternative way of specifying b as a variable:

library(dplyr)
sumByColumn <- function(df, colName) {
myenc <- enquo(colName)
df %>%
group_by(a) %>%
summarize(tot = sum(!!myenc))
}

sumByColumn(data, b)
## A tibble: 2 x 2
# a tot
# <int> <int>
#1 1 24
#2 2 27

How to pass column name as argument to function for dplyr verbs?

Here is another way of making it work. You can use .data[[var]] construct for a column name which is stored as a string:

foo <- function(data, colName) {

result <- data %>%
group_by(.data[[colName]]) %>%
summarise(count = n())

return(result)
}

foo(quakes, "stations")

# A tibble: 102 x 2
stations count
<int> <int>
1 10 20
2 11 28
3 12 25
4 13 21
5 14 39
6 15 34
7 16 35
8 17 38
9 18 33
10 19 29
# ... with 92 more rows

In case you decide not to pass the ColName as a string you you wrap it with a pair of curly braces inside your function to get the similar result:

foo <- function(data, colName) {

result <- data %>%
group_by({{ colName }}) %>%
summarise(count = n())

return(result)
}

foo(quakes, stations)

# A tibble: 102 x 2
stations count
<int> <int>
1 10 20
2 11 28
3 12 25
4 13 21
5 14 39
6 15 34
7 16 35
8 17 38
9 18 33
10 19 29
# ... with 92 more rows

How can I pass a column name as a function argument using dplyr and ggplot2?

This code seems to fix it. As the commenters above mention, variables passed in to the function must be wrapped in the "enquo" function and then unwrapped with the !!. Note the aes() function becomes aes_() when working with strings.

library(tidyverse)

to_plot <- function(df, model, response_variable, indep_variable) {
response_variable <- enquo(response_variable)
indep_variable <- enquo(indep_variable)

resp_plot <-
df %>%
mutate(model_resp = predict.glm(model, df, type = 'response')) %>%
group_by(!!indep_variable) %>%
summarize(actual_response = mean(!!response_variable),
predicted_response = mean(model_resp)) %>%
ggplot(aes_(indep_variable)) +
geom_line(aes_(x = indep_variable, y = quote(actual_response)), colour = "blue") +
geom_line(aes_(x = indep_variable, y = quote(predicted_response)), colour = "red") +
ylab(label = 'Response')

return(resp_plot)
}

fit <- glm(data = mtcars, mpg ~ wt + qsec + am, family = gaussian(link = 'identity'))
to_plot(mtcars, fit, mpg, wt)

How to refer to variable (column name) with tidyverse in a function?

You can call the function using symbols rather than strings for the column names by using the {{ ('curly curly') operator:

library(tidyverse)

f3 <- function(x){
mtcars %>%
group_by(cyl, gear) %>%
summarize(m = mean({{x}}),
sd = sd({{x}}),
n = length({{x}}),
se = sd / sqrt(n),
tscore = qt(0.975, n-1),
margin = tscore * se,
uppma = m + margin,
lowma = m - margin,
.groups = 'drop')
}

f3(x = wt)
#> # A tibble: 8 x 10
#> cyl gear m sd n se tscore margin uppma lowma
#> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 3 2.46 NA 1 NA NaN NaN NaN NaN
#> 2 4 4 2.38 0.601 8 0.212 2.36 0.502 2.88 1.88
#> 3 4 5 1.83 0.443 2 0.314 12.7 3.98 5.81 -2.16
#> 4 6 3 3.34 0.173 2 0.123 12.7 1.56 4.89 1.78
#> 5 6 4 3.09 0.413 4 0.207 3.18 0.657 3.75 2.44
#> 6 6 5 2.77 NA 1 NA NaN NaN NaN NaN
#> 7 8 3 4.10 0.768 12 0.222 2.20 0.488 4.59 3.62
#> 8 8 5 3.37 0.283 2 0.2 12.7 2.54 5.91 0.829

Pass a data.frame column name to a function

You can just use the column name directly:

df <- data.frame(A=1:10, B=2:11, C=3:12)
fun1 <- function(x, column){
max(x[,column])
}
fun1(df, "B")
fun1(df, c("B","A"))

There's no need to use substitute, eval, etc.

You can even pass the desired function as a parameter:

fun1 <- function(x, column, fn) {
fn(x[,column])
}
fun1(df, "B", max)

Alternatively, using [[ also works for selecting a single column at a time:

df <- data.frame(A=1:10, B=2:11, C=3:12)
fun1 <- function(x, column){
max(x[[column]])
}
fun1(df, "B")

pass a column name to a function using dplyr mutate without using the depreciated mutate_

For setting variable names you'll need a string on the left hand side and := instead of = in mutate.

You can use quo_name for turning z into a string for the column name.

Your function could then look like:

my.f = function(df, column_var) {
column_var = enquo(column_var)

df %>%
mutate(!!quo_name(column_var) := y) %>%
filter( !is.na(!!column_var) )
}

my.f(d, z)

# A tibble: 3 x 2
y z
<dbl> <dbl>
1 1 1
2 2 2
3 3 3


Related Topics



Leave a reply



Submit