Error when using dplyr inside of a function
UPDATE: As of dplyr 0.7.0 you can use tidy eval to accomplish this.
See http://dplyr.tidyverse.org/articles/programming.html for more details.
filter_big <- function(spp, LENGTH, WIDTH) {
LENGTH <- enquo(LENGTH) # Create quosure
WIDTH <- enquo(WIDTH) # Create quosure
iris %>%
filter(Species == spp) %>%
select(!!LENGTH, !!WIDTH) %>% # Use !! to unquote the quosure
mutate(sum = (!!LENGTH) + (!!WIDTH)) %>% # Use !! to unquote the quosure
filter(sum > 4) %>%
nrow()
}
filter_big("virginica", Sepal.Length, Sepal.Width)
> filter_big("virginica", Sepal.Length, Sepal.Width)
[1] 50
Error when using dplyr {{ }} with aggregate inside a function
{{ }}
is tidyverse
syntax, and should only work inside tidyverse
verbs.
If we want to achieve something like this
aggregate(. ~ Species, data = iris, sum)
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa 250.3 171.4 73.1 12.3
2 versicolor 296.8 138.5 213.0 66.3
3 virginica 329.4 148.7 277.6 101.3
We can make a formula on the fly, manipulating as text like so
aggregate_var <- function(df, level) {
level <- deparse(substitute(level))
aggregate(formula(paste(". ~", level)), data=df, FUN=sum)
}
aggregate_var(iris, Species)
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa 250.3 171.4 73.1 12.3
2 versicolor 296.8 138.5 213.0 66.3
3 virginica 329.4 148.7 277.6 101.3
As an aside - filter
is a popular function name, perhaps a more detailed description is useful. Also note that an explicit return
statement and the assignment to df
are not needed here.
Problem with a dplyr filter inside a function in R
You need to use tidy evaluation. More info here:
Tidy evaluation book
Tidy evaluation resources
library(zoo)
library(rlang)
library(tidyverse)
dat <- structure(list(X1979 = c(1.26884, 0.75802, 0.35127, -0.0679517,
-4.34841, -0.312289, -5.02931, -2.49339, -12.9065, -2.90853,
-1.02833, 0.333109, 1.70236, -2.44456, -1.83307, -0.982637, -2.14197,
-4.1294, -3.98545, -6.26205, -5.56162, 0.0789091, 1.63146, -0.214938
), X1980 = c(-1.32651, -0.0199441, -1.08583, 3.25939, 0.0402712,
-3.22174, -0.859756, -3.30898, 1.0128, 0.847161, 2.75866, 1.93117,
1.05851, 1.83372, -0.811736, -0.992584, -0.110012, 0.132343,
2.21745, -1.48902, 0.111302, -3.77058, -3.65044, -2.41263)), class =
"data.frame", row.names = 50:73)
Use curly-curly {{}}
test <- function(dat, column_name){
dat %>%
rownames_to_column() %>%
filter({{column_name}} > 0 &
rollsum({{column_name}} > 0, 4, fill = NA, align =
"left") >= 3 &
rollsum({{column_name}}, 4, fill = NA, align =
"left") > 1) %>%
slice(1) -> result
return(result)
}
test(dat, X1979)
#> rowname X1979 X1980
#> 1 50 1.2688 -1.3265
Use .data[[]]
pronoun
test2 <- function(dat, column_name){
dat %>%
rownames_to_column() %>%
filter(.data[[column_name]] > 0 &
rollsum(.data[[column_name]] > 0, 4, fill = NA, align =
"left") >= 3 &
rollsum(.data[[column_name]], 4, fill = NA, align =
"left") > 1) %>%
slice(1) -> result
return(result)
}
out <- colnames(dat) %>%
set_names %>%
map_dfr(~ test2(dat, .x), .id = 'Col_ID')
out
#> Col_ID rowname X1979 X1980
#> 1 X1979 50 1.2688 -1.3265
#> 2 X1980 58 -12.9065 1.0128
Created on 2020-05-05 by the reprex package (v0.3.0)
Using Variable names for dplyr inside function
sum1 <- function(df, group_var,x,y) {
group_var <- enquo(group_var)
x = as.name(x)
y = as.name(y)
df.temp<- df %>%
group_by(!!group_var) %>%
mutate(
sum = !!enquo(x)+!!enquo(y)
)
return(df.temp)
}
sum1(df, g1, A.Key, B.Key)
# A tibble: 5 x 4
# Groups: g1 [5]
g1 a b sum
<dbl> <int> <int> <int>
1 1. 3 2 5
2 2. 2 1 3
3 3. 1 3 4
4 4. 4 4 8
5 5. 5 5 10
R: Using dplyr inside a function. exception in eval(expr, envir, enclos): unknown column
This is the problem with functions using NSE (non-standard evaluation). Functions using NSE are very useful in interactive programming but cause many problems in development i.e. when you try to use those inside other functions. Due to expressions not being evaluated directly, R is not able to find the objects in the environments it looks in. I can suggest you read here and preferably the scoping issues chapter for more info.
First of all you need to know that ALL the standard dplyr
functions use NSE. Let's see an approximate example to your problem:
Data:
df <- data.frame(col1 = rep(c('a','b'), each=5), col2 = runif(10))
> df
col1 col2
1 a 0.03366446
2 a 0.46698763
3 a 0.34114682
4 a 0.92125387
5 a 0.94511394
6 b 0.67241460
7 b 0.38168131
8 b 0.91107090
9 b 0.15342089
10 b 0.60751868
Let's see how NSE will make our simple problem crush:
First of all the simple interactive case works:
df %>% group_by(col1) %>% summarise(count = n())
Source: local data frame [2 x 2]
col1 count
1 a 5
2 b 5
Let's see what happens if I put it in a function:
lets_group <- function(column) {
df %>% group_by(column) %>% summarise(count = n())
}
>lets_group(col1)
Error: index out of bounds
Not the same error as yours but it is caused by NSE. Exactly the same line of code worked outside the function.
Fortunately, there is a solution to your problem and that is standard evaluation. Hadley also made versions of all the functions in dplyr
that use standard evaluation. They are just the normal functions plus the _
underscore at the end.
Now look at how this will work:
#notice the formula operator (~) at the function at summarise_
lets_group2 <- function(column) {
df %>% group_by_(column) %>% summarise_(count = ~n())
}
This yields the following result:
#also notice the quotes around col1
> lets_group2('col1')
Source: local data frame [2 x 2]
col1 count
1 a 5
2 b 5
I cannot test your problem but using SE instead of NSE will give you the results you want. For more info you can also read here
Using dplyr within a function, Grouping Error with function arguments
From the NSE vignette:
If you also want to output variables to vary, you need to pass a list
of quoted objects to the .dots argument:
Here, variable
should be quoted:
subgroup_analysis <- function(database,...){
df <- database %>%
select(diamond, subgroup_column, x,y,z) %>%
melt(id.vars=c("diamond", subgroup_name)) %>%
group_by_(subgroup_name, quote(variable)) %>%
summarise(value = round(mean(value, na.rm = TRUE),2))
print(df)
}
subgroup_analysis(database, subgroup_column, subgroup_name)
As mentionned by @RichardScriven, if you plan to assign the result to a new variable, then you may want to remove the print
call at the end and just write df
, or not even assign df
at all in the function
Otherwise the result prints even when you do x <- subgroup_analysis(...)
Related Topics
R - Run Source() in Background
R Table Function: How to Sum Instead of Counting
Shiny R Renderplots on the Fly
How to Read CSV Data with Unknown Encoding in R
How to Subset Data.Frames Stored in a List
R View() Does Not Display All Columns of Data Frame
Subscripts and Superscripts "-" or "+" with Ggplot2 Axis Labels? (Ionic Chemical Notation)
Date Time Conversion and Extract Only Time
Merge/Combine Columns with Same Name But Incomplete Data
Multiple Graphs Over Multiple Pages Using Ggplot
Change the Color and Font of Text in Shiny App
Converting a Factor to Numeric Without Losing Information R (As.Numeric() Doesn't Seem to Work)
Coding Variable Values into Classes Using R
In R, What Does "Loaded via a Namespace (And Not Attached)" Mean