Dplyr::Do() Requires Named Function

dplyr::do() requires named function?

You don't need an anonymous function:

library(dplyr)
iris %>%
group_by(Species) %>%
do({
mod <- lm(Sepal.Length ~ Sepal.Width, data = .)
pred <- predict(mod, newdata = .["Sepal.Width"])
data.frame(., pred)
})

dplyr - using column names as function arguments

This can work using the latest dplyr syntax (as can be seen on github):

library(dplyr)
library(rlang)
sumByColumn <- function(df, colName) {
df %>%
group_by(a) %>%
summarize(tot = sum(!! sym(colName)))
}

sumByColumn(data, "b")
## A tibble: 2 x 2
# a tot
# <int> <int>
#1 1 24
#2 2 27

And an alternative way of specifying b as a variable:

library(dplyr)
sumByColumn <- function(df, colName) {
myenc <- enquo(colName)
df %>%
group_by(a) %>%
summarize(tot = sum(!!myenc))
}

sumByColumn(data, b)
## A tibble: 2 x 2
# a tot
# <int> <int>
#1 1 24
#2 2 27

Passing column name as argument in function within pipes

You need to make use of non standard evaluation which is worth a quick read about. In this case you most likely need to !! infront of var in the mutate line.

Here's the line:

mutate(new_variable = !!sym(var) * 100)

Dplyr write a function with column names as inputs

Is this what you expected?

df<-tbl_df(data.frame(group=rep(c("A", "B"), each=3), var1=sample(1:100, 6), var2=sample(1:100, 6)))

example<-function(colname){
df %>%
group_by(group)%>%
summarize(output=mean(sqrt(colname)))%>%
select(output)
}
example( quote(var1) )
#-----
Source: local data frame [2 x 1]

output
1 7.185935
2 8.090866

Passing (function) user-specified column name to dplyr do()

This is because of regular do() semantics where there is no data masking apart from .:

do(df, data.frame(y = sum(.$response)))
#> y
#> 1 6

do(df, data.frame(y = sum(.[[response]])))
#> Error: object 'response' not found

So you just need to capture the bare column name as a string and there is no need to unquote since there is no data masking:

sum_with_do <- function(df, x, ...) {
# ensym() guarantees that `x` is a simple column name and not a
# complex expression:
x <- as.character(ensym(x))

df %>%
group_by(...) %>%
do(data.frame(y = sum(.[[x]])))
}

How to refer to variable (column name) with tidyverse in a function?

You can call the function using symbols rather than strings for the column names by using the {{ ('curly curly') operator:

library(tidyverse)

f3 <- function(x){
mtcars %>%
group_by(cyl, gear) %>%
summarize(m = mean({{x}}),
sd = sd({{x}}),
n = length({{x}}),
se = sd / sqrt(n),
tscore = qt(0.975, n-1),
margin = tscore * se,
uppma = m + margin,
lowma = m - margin,
.groups = 'drop')
}

f3(x = wt)
#> # A tibble: 8 x 10
#> cyl gear m sd n se tscore margin uppma lowma
#> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 3 2.46 NA 1 NA NaN NaN NaN NaN
#> 2 4 4 2.38 0.601 8 0.212 2.36 0.502 2.88 1.88
#> 3 4 5 1.83 0.443 2 0.314 12.7 3.98 5.81 -2.16
#> 4 6 3 3.34 0.173 2 0.123 12.7 1.56 4.89 1.78
#> 5 6 4 3.09 0.413 4 0.207 3.18 0.657 3.75 2.44
#> 6 6 5 2.77 NA 1 NA NaN NaN NaN NaN
#> 7 8 3 4.10 0.768 12 0.222 2.20 0.488 4.59 3.62
#> 8 8 5 3.37 0.283 2 0.2 12.7 2.54 5.91 0.829

How do I write a dplyr pipe-friendly function where a new column name is provided from a function argument?

In this case you can just stick to using the embrace {{}} option for your variables. If you want to dynamically create column names, you're going to still need to use :=. The difference here is that you can use the glue-style syntax with the embrace operator to get the name of the symbol. This works with the data provided.

elective_open <- function(.data, name_for_elective, course, tiebreaker){ 
.data%>%
mutate("{{name_for_elective}}" := ifelse({{tiebreaker}}==max({{tiebreaker}}),1,0)) %>%
mutate("{{name_for_elective}}" := ifelse({{name_for_elective}}==0,{{course}}[{{name_for_elective}}==1],"")) %>%
filter(!({{course}} %in% {{name_for_elective}}))
}

Can't use dplyr::arrange() to sort a column in the form of a date in r

Instead of the double quoted column name, use backquote

library(dplyr)
values %>%
dplyr::arrange(`2022-03-01`)

-output

   2022-03-01
J 0.6
E 2.0
A 2.7
B 3.7
C 5.7
I 6.3
H 6.6
F 9.0
D 9.1
G 9.4

If we want to pass as string, either use within across

values %>%
dplyr::arrange(across("2022-03-01"))
2022-03-01
J 0.6
E 2.0
A 2.7
B 3.7
C 5.7
I 6.3
H 6.6
F 9.0
D 9.1
G 9.4

Or convert to symbol and evaluate (!!)

values %>%
dplyr::arrange(!! rlang::sym("2022-03-01"))
2022-03-01
J 0.6
E 2.0
A 2.7
B 3.7
C 5.7
I 6.3
H 6.6
F 9.0
D 9.1
G 9.4

Or with .data

values %>% 
dplyr::arrange(.data[["2022-03-01"]])


Related Topics



Leave a reply



Submit