Error When Using Dplyr Inside of a Function

Error when using dplyr inside of a function

UPDATE: As of dplyr 0.7.0 you can use tidy eval to accomplish this.

See http://dplyr.tidyverse.org/articles/programming.html for more details.

filter_big <- function(spp, LENGTH, WIDTH) {
LENGTH <- enquo(LENGTH) # Create quosure
WIDTH <- enquo(WIDTH) # Create quosure

iris %>%
filter(Species == spp) %>%
select(!!LENGTH, !!WIDTH) %>% # Use !! to unquote the quosure
mutate(sum = (!!LENGTH) + (!!WIDTH)) %>% # Use !! to unquote the quosure
filter(sum > 4) %>%
nrow()
}

filter_big("virginica", Sepal.Length, Sepal.Width)

> filter_big("virginica", Sepal.Length, Sepal.Width)
[1] 50

Error when using dplyr {{ }} with aggregate inside a function

{{ }} is tidyverse syntax, and should only work inside tidyverse verbs.

If we want to achieve something like this

aggregate(. ~ Species, data = iris, sum)
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa 250.3 171.4 73.1 12.3
2 versicolor 296.8 138.5 213.0 66.3
3 virginica 329.4 148.7 277.6 101.3

We can make a formula on the fly, manipulating as text like so

aggregate_var <- function(df, level) {
level <- deparse(substitute(level))
aggregate(formula(paste(". ~", level)), data=df, FUN=sum)
}

aggregate_var(iris, Species)
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa 250.3 171.4 73.1 12.3
2 versicolor 296.8 138.5 213.0 66.3
3 virginica 329.4 148.7 277.6 101.3

As an aside - filter is a popular function name, perhaps a more detailed description is useful. Also note that an explicit return statement and the assignment to df are not needed here.

Problem with a dplyr filter inside a function in R

You need to use tidy evaluation. More info here:

  • Tidy evaluation book

  • Tidy evaluation resources

library(zoo)
library(rlang)
library(tidyverse)

dat <- structure(list(X1979 = c(1.26884, 0.75802, 0.35127, -0.0679517,
-4.34841, -0.312289, -5.02931, -2.49339, -12.9065, -2.90853,
-1.02833, 0.333109, 1.70236, -2.44456, -1.83307, -0.982637, -2.14197,
-4.1294, -3.98545, -6.26205, -5.56162, 0.0789091, 1.63146, -0.214938
), X1980 = c(-1.32651, -0.0199441, -1.08583, 3.25939, 0.0402712,
-3.22174, -0.859756, -3.30898, 1.0128, 0.847161, 2.75866, 1.93117,
1.05851, 1.83372, -0.811736, -0.992584, -0.110012, 0.132343,
2.21745, -1.48902, 0.111302, -3.77058, -3.65044, -2.41263)), class =
"data.frame", row.names = 50:73)

Use curly-curly {{}}

test <- function(dat, column_name){ 
dat %>%
rownames_to_column() %>%
filter({{column_name}} > 0 &
rollsum({{column_name}} > 0, 4, fill = NA, align =
"left") >= 3 &
rollsum({{column_name}}, 4, fill = NA, align =
"left") > 1) %>%
slice(1) -> result
return(result)
}

test(dat, X1979)
#> rowname X1979 X1980
#> 1 50 1.2688 -1.3265

Use .data[[]] pronoun

test2 <- function(dat, column_name){ 
dat %>%
rownames_to_column() %>%
filter(.data[[column_name]] > 0 &
rollsum(.data[[column_name]] > 0, 4, fill = NA, align =
"left") >= 3 &
rollsum(.data[[column_name]], 4, fill = NA, align =
"left") > 1) %>%
slice(1) -> result
return(result)
}

out <- colnames(dat) %>%
set_names %>%
map_dfr(~ test2(dat, .x), .id = 'Col_ID')
out
#> Col_ID rowname X1979 X1980
#> 1 X1979 50 1.2688 -1.3265
#> 2 X1980 58 -12.9065 1.0128

Created on 2020-05-05 by the reprex package (v0.3.0)

Using Variable names for dplyr inside function

sum1 <- function(df, group_var,x,y) {

group_var <- enquo(group_var)

x = as.name(x)
y = as.name(y)

df.temp<- df %>%
group_by(!!group_var) %>%
mutate(
sum = !!enquo(x)+!!enquo(y)
)

return(df.temp)
}

sum1(df, g1, A.Key, B.Key)
# A tibble: 5 x 4
# Groups: g1 [5]
g1 a b sum
<dbl> <int> <int> <int>
1 1. 3 2 5
2 2. 2 1 3
3 3. 1 3 4
4 4. 4 4 8
5 5. 5 5 10

R: Using dplyr inside a function. exception in eval(expr, envir, enclos): unknown column

This is the problem with functions using NSE (non-standard evaluation). Functions using NSE are very useful in interactive programming but cause many problems in development i.e. when you try to use those inside other functions. Due to expressions not being evaluated directly, R is not able to find the objects in the environments it looks in. I can suggest you read here and preferably the scoping issues chapter for more info.

First of all you need to know that ALL the standard dplyr functions use NSE. Let's see an approximate example to your problem:

Data:

df <- data.frame(col1 = rep(c('a','b'), each=5), col2 = runif(10))

> df
col1 col2
1 a 0.03366446
2 a 0.46698763
3 a 0.34114682
4 a 0.92125387
5 a 0.94511394
6 b 0.67241460
7 b 0.38168131
8 b 0.91107090
9 b 0.15342089
10 b 0.60751868

Let's see how NSE will make our simple problem crush:

First of all the simple interactive case works:

df %>% group_by(col1) %>% summarise(count = n())

Source: local data frame [2 x 2]

col1 count
1 a 5
2 b 5

Let's see what happens if I put it in a function:

lets_group <- function(column) {
df %>% group_by(column) %>% summarise(count = n())
}

>lets_group(col1)
Error: index out of bounds

Not the same error as yours but it is caused by NSE. Exactly the same line of code worked outside the function.

Fortunately, there is a solution to your problem and that is standard evaluation. Hadley also made versions of all the functions in dplyr that use standard evaluation. They are just the normal functions plus the _ underscore at the end.

Now look at how this will work:

#notice the formula operator (~) at the function at summarise_
lets_group2 <- function(column) {
df %>% group_by_(column) %>% summarise_(count = ~n())
}

This yields the following result:

#also notice the quotes around col1
> lets_group2('col1')
Source: local data frame [2 x 2]

col1 count
1 a 5
2 b 5

I cannot test your problem but using SE instead of NSE will give you the results you want. For more info you can also read here

Using dplyr within a function, Grouping Error with function arguments

From the NSE vignette:

If you also want to output variables to vary, you need to pass a list
of quoted objects to the .dots argument:

Here, variable should be quoted:

subgroup_analysis <- function(database,...){

df <- database %>%
select(diamond, subgroup_column, x,y,z) %>%
melt(id.vars=c("diamond", subgroup_name)) %>%
group_by_(subgroup_name, quote(variable)) %>%
summarise(value = round(mean(value, na.rm = TRUE),2))
print(df)
}

subgroup_analysis(database, subgroup_column, subgroup_name)

As mentionned by @RichardScriven, if you plan to assign the result to a new variable, then you may want to remove the print call at the end and just write df, or not even assign df at all in the function

Otherwise the result prints even when you do x <- subgroup_analysis(...)



Related Topics



Leave a reply



Submit