dplyr::do() requires named function?
You don't need an anonymous function:
library(dplyr)
iris %>%
group_by(Species) %>%
do({
mod <- lm(Sepal.Length ~ Sepal.Width, data = .)
pred <- predict(mod, newdata = .["Sepal.Width"])
data.frame(., pred)
})
dplyr - using column names as function arguments
This can work using the latest dplyr
syntax (as can be seen on github):
library(dplyr)
library(rlang)
sumByColumn <- function(df, colName) {
df %>%
group_by(a) %>%
summarize(tot = sum(!! sym(colName)))
}
sumByColumn(data, "b")
## A tibble: 2 x 2
# a tot
# <int> <int>
#1 1 24
#2 2 27
And an alternative way of specifying b
as a variable:
library(dplyr)
sumByColumn <- function(df, colName) {
myenc <- enquo(colName)
df %>%
group_by(a) %>%
summarize(tot = sum(!!myenc))
}
sumByColumn(data, b)
## A tibble: 2 x 2
# a tot
# <int> <int>
#1 1 24
#2 2 27
Passing column name as argument in function within pipes
You need to make use of non standard evaluation which is worth a quick read about. In this case you most likely need to !!
infront of var
in the mutate line.
Here's the line:
mutate(new_variable = !!sym(var) * 100)
Dplyr write a function with column names as inputs
Is this what you expected?
df<-tbl_df(data.frame(group=rep(c("A", "B"), each=3), var1=sample(1:100, 6), var2=sample(1:100, 6)))
example<-function(colname){
df %>%
group_by(group)%>%
summarize(output=mean(sqrt(colname)))%>%
select(output)
}
example( quote(var1) )
#-----
Source: local data frame [2 x 1]
output
1 7.185935
2 8.090866
Passing (function) user-specified column name to dplyr do()
This is because of regular do()
semantics where there is no data masking apart from .
:
do(df, data.frame(y = sum(.$response)))
#> y
#> 1 6
do(df, data.frame(y = sum(.[[response]])))
#> Error: object 'response' not found
So you just need to capture the bare column name as a string and there is no need to unquote since there is no data masking:
sum_with_do <- function(df, x, ...) {
# ensym() guarantees that `x` is a simple column name and not a
# complex expression:
x <- as.character(ensym(x))
df %>%
group_by(...) %>%
do(data.frame(y = sum(.[[x]])))
}
How to refer to variable (column name) with tidyverse in a function?
You can call the function using symbols rather than strings for the column names by using the {{
('curly curly') operator:
library(tidyverse)
f3 <- function(x){
mtcars %>%
group_by(cyl, gear) %>%
summarize(m = mean({{x}}),
sd = sd({{x}}),
n = length({{x}}),
se = sd / sqrt(n),
tscore = qt(0.975, n-1),
margin = tscore * se,
uppma = m + margin,
lowma = m - margin,
.groups = 'drop')
}
f3(x = wt)
#> # A tibble: 8 x 10
#> cyl gear m sd n se tscore margin uppma lowma
#> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 3 2.46 NA 1 NA NaN NaN NaN NaN
#> 2 4 4 2.38 0.601 8 0.212 2.36 0.502 2.88 1.88
#> 3 4 5 1.83 0.443 2 0.314 12.7 3.98 5.81 -2.16
#> 4 6 3 3.34 0.173 2 0.123 12.7 1.56 4.89 1.78
#> 5 6 4 3.09 0.413 4 0.207 3.18 0.657 3.75 2.44
#> 6 6 5 2.77 NA 1 NA NaN NaN NaN NaN
#> 7 8 3 4.10 0.768 12 0.222 2.20 0.488 4.59 3.62
#> 8 8 5 3.37 0.283 2 0.2 12.7 2.54 5.91 0.829
How do I write a dplyr pipe-friendly function where a new column name is provided from a function argument?
In this case you can just stick to using the embrace {{}}
option for your variables. If you want to dynamically create column names, you're going to still need to use :=
. The difference here is that you can use the glue-style syntax with the embrace operator to get the name of the symbol. This works with the data provided.
elective_open <- function(.data, name_for_elective, course, tiebreaker){
.data%>%
mutate("{{name_for_elective}}" := ifelse({{tiebreaker}}==max({{tiebreaker}}),1,0)) %>%
mutate("{{name_for_elective}}" := ifelse({{name_for_elective}}==0,{{course}}[{{name_for_elective}}==1],"")) %>%
filter(!({{course}} %in% {{name_for_elective}}))
}
Can't use dplyr::arrange() to sort a column in the form of a date in r
Instead of the double quoted column name, use backquote
library(dplyr)
values %>%
dplyr::arrange(`2022-03-01`)
-output
2022-03-01
J 0.6
E 2.0
A 2.7
B 3.7
C 5.7
I 6.3
H 6.6
F 9.0
D 9.1
G 9.4
If we want to pass as string, either use within across
values %>%
dplyr::arrange(across("2022-03-01"))
2022-03-01
J 0.6
E 2.0
A 2.7
B 3.7
C 5.7
I 6.3
H 6.6
F 9.0
D 9.1
G 9.4
Or convert to sym
bol and evaluate (!!
)
values %>%
dplyr::arrange(!! rlang::sym("2022-03-01"))
2022-03-01
J 0.6
E 2.0
A 2.7
B 3.7
C 5.7
I 6.3
H 6.6
F 9.0
D 9.1
G 9.4
Or with .data
values %>%
dplyr::arrange(.data[["2022-03-01"]])
Related Topics
How to Manually Change the Key Labels in a Legend in Ggplot2
Use Loop to Generate Section of Text in Rmarkdown
Double Clustered Standard Errors for Panel Data
First Day of the Month from a Posixct Date Time Using Lubridate
Represent Numeric Value with Typical Dollar Amount Format
Element-Wise Concatenation of String Vectors
How to Change Angle of Line in Customized Legend in Ggplot2
Identify Consecutively Overlapping Segments in R
How to Change Stacking Order in Stacked Bar Chart in R
Count the Number of Non-Zero Elements of Each Column
R - Ggplot2 - Highlighting Selected Points and Strange Behavior
Different Colour Palettes for Two Different Colour Aesthetic Mappings in Ggplot2
How to Sort a Character Vector According to a Specific Order
Get the Index of the Values of One Vector in Another
Number Format, Writing 1E-5 Instead of 0.00001