dplyr - using column names as function arguments
This can work using the latest dplyr
syntax (as can be seen on github):
library(dplyr)
library(rlang)
sumByColumn <- function(df, colName) {
df %>%
group_by(a) %>%
summarize(tot = sum(!! sym(colName)))
}
sumByColumn(data, "b")
## A tibble: 2 x 2
# a tot
# <int> <int>
#1 1 24
#2 2 27
And an alternative way of specifying b
as a variable:
library(dplyr)
sumByColumn <- function(df, colName) {
myenc <- enquo(colName)
df %>%
group_by(a) %>%
summarize(tot = sum(!!myenc))
}
sumByColumn(data, b)
## A tibble: 2 x 2
# a tot
# <int> <int>
#1 1 24
#2 2 27
How to pass column name as argument to function for dplyr verbs?
Here is another way of making it work. You can use .data[[var]]
construct for a column name which is stored as a string:
foo <- function(data, colName) {
result <- data %>%
group_by(.data[[colName]]) %>%
summarise(count = n())
return(result)
}
foo(quakes, "stations")
# A tibble: 102 x 2
stations count
<int> <int>
1 10 20
2 11 28
3 12 25
4 13 21
5 14 39
6 15 34
7 16 35
8 17 38
9 18 33
10 19 29
# ... with 92 more rows
In case you decide not to pass the ColName
as a string you you wrap it with a pair of curly braces inside your function to get the similar result:
foo <- function(data, colName) {
result <- data %>%
group_by({{ colName }}) %>%
summarise(count = n())
return(result)
}
foo(quakes, stations)
# A tibble: 102 x 2
stations count
<int> <int>
1 10 20
2 11 28
3 12 25
4 13 21
5 14 39
6 15 34
7 16 35
8 17 38
9 18 33
10 19 29
# ... with 92 more rows
How can I pass a column name as a function argument using dplyr and ggplot2?
This code seems to fix it. As the commenters above mention, variables passed in to the function must be wrapped in the "enquo" function and then unwrapped with the !!. Note the aes() function becomes aes_() when working with strings.
library(tidyverse)
to_plot <- function(df, model, response_variable, indep_variable) {
response_variable <- enquo(response_variable)
indep_variable <- enquo(indep_variable)
resp_plot <-
df %>%
mutate(model_resp = predict.glm(model, df, type = 'response')) %>%
group_by(!!indep_variable) %>%
summarize(actual_response = mean(!!response_variable),
predicted_response = mean(model_resp)) %>%
ggplot(aes_(indep_variable)) +
geom_line(aes_(x = indep_variable, y = quote(actual_response)), colour = "blue") +
geom_line(aes_(x = indep_variable, y = quote(predicted_response)), colour = "red") +
ylab(label = 'Response')
return(resp_plot)
}
fit <- glm(data = mtcars, mpg ~ wt + qsec + am, family = gaussian(link = 'identity'))
to_plot(mtcars, fit, mpg, wt)
How to refer to variable (column name) with tidyverse in a function?
You can call the function using symbols rather than strings for the column names by using the {{
('curly curly') operator:
library(tidyverse)
f3 <- function(x){
mtcars %>%
group_by(cyl, gear) %>%
summarize(m = mean({{x}}),
sd = sd({{x}}),
n = length({{x}}),
se = sd / sqrt(n),
tscore = qt(0.975, n-1),
margin = tscore * se,
uppma = m + margin,
lowma = m - margin,
.groups = 'drop')
}
f3(x = wt)
#> # A tibble: 8 x 10
#> cyl gear m sd n se tscore margin uppma lowma
#> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 3 2.46 NA 1 NA NaN NaN NaN NaN
#> 2 4 4 2.38 0.601 8 0.212 2.36 0.502 2.88 1.88
#> 3 4 5 1.83 0.443 2 0.314 12.7 3.98 5.81 -2.16
#> 4 6 3 3.34 0.173 2 0.123 12.7 1.56 4.89 1.78
#> 5 6 4 3.09 0.413 4 0.207 3.18 0.657 3.75 2.44
#> 6 6 5 2.77 NA 1 NA NaN NaN NaN NaN
#> 7 8 3 4.10 0.768 12 0.222 2.20 0.488 4.59 3.62
#> 8 8 5 3.37 0.283 2 0.2 12.7 2.54 5.91 0.829
Pass a data.frame column name to a function
You can just use the column name directly:
df <- data.frame(A=1:10, B=2:11, C=3:12)
fun1 <- function(x, column){
max(x[,column])
}
fun1(df, "B")
fun1(df, c("B","A"))
There's no need to use substitute, eval, etc.
You can even pass the desired function as a parameter:
fun1 <- function(x, column, fn) {
fn(x[,column])
}
fun1(df, "B", max)
Alternatively, using [[
also works for selecting a single column at a time:
df <- data.frame(A=1:10, B=2:11, C=3:12)
fun1 <- function(x, column){
max(x[[column]])
}
fun1(df, "B")
pass a column name to a function using dplyr mutate without using the depreciated mutate_
For setting variable names you'll need a string on the left hand side and :=
instead of =
in mutate
.
You can use quo_name
for turning z
into a string for the column name.
Your function could then look like:
my.f = function(df, column_var) {
column_var = enquo(column_var)
df %>%
mutate(!!quo_name(column_var) := y) %>%
filter( !is.na(!!column_var) )
}
my.f(d, z)
# A tibble: 3 x 2
y z
<dbl> <dbl>
1 1 1
2 2 2
3 3 3
Related Topics
Generally Disable Dimension Dropping for Matrices
How to Automatically Load Data in an R Package
Adding a Simple Lm Trend Line to a Ggplot Boxplot
Max and Min Functions That Are Similar to Colmeans
1-Dimensional Matrix Is Changed to a Vector in R
Difference Between Sort(), Rank(), and Order()
Create Combinations of a Binary Vector
Using Variable Value as Column Name in Data.Frame or Cbind
Replace Na with Previous and Next Rows Mean in R
How to Use Tidyr to Fill in Completed Rows Within Each Value of a Grouping Variable
Web Scraping of Key Stats in Yahoo! Finance with R
Order X Axis Day Values in Ggplot2
Insert Missing Time Rows into a Dataframe
How to See All Rows of a Data Frame in a Jupyter Notebook with an R Kernel
Linking Intel's Math Kernel Library (Mkl) to R on Windows