Error When Using Dplyr Inside of a Function

Error when using dplyr inside of a function

UPDATE: As of dplyr 0.7.0 you can use tidy eval to accomplish this.

See http://dplyr.tidyverse.org/articles/programming.html for more details.

filter_big <- function(spp, LENGTH, WIDTH) {
  LENGTH <- enquo(LENGTH)                    # Create quosure
  WIDTH  <- enquo(WIDTH)                     # Create quosure

  iris %>% 
    filter(Species == spp) %>% 
    select(!!LENGTH, !!WIDTH) %>%            # Use !! to unquote the quosure
    mutate(sum = (!!LENGTH) + (!!WIDTH)) %>% # Use !! to unquote the quosure
    filter(sum > 4) %>% 
    nrow()
}

filter_big("virginica", Sepal.Length, Sepal.Width)

> filter_big("virginica", Sepal.Length, Sepal.Width)
[1] 50

Error when using dplyr {{ }} with aggregate inside a function

{{ }} is tidyverse syntax, and should only work inside tidyverse verbs.

If we want to achieve something like this

aggregate(. ~ Species, data = iris, sum)
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa        250.3       171.4         73.1        12.3
2 versicolor        296.8       138.5        213.0        66.3
3  virginica        329.4       148.7        277.6       101.3

We can make a formula on the fly, manipulating as text like so

aggregate_var <- function(df, level) {
  level <- deparse(substitute(level))
  aggregate(formula(paste(". ~", level)), data=df, FUN=sum) 
}

aggregate_var(iris, Species)
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa        250.3       171.4         73.1        12.3
2 versicolor        296.8       138.5        213.0        66.3
3  virginica        329.4       148.7        277.6       101.3

As an aside - filter is a popular function name, perhaps a more detailed description is useful. Also note that an explicit return statement and the assignment to df are not needed here.

Problem with a dplyr filter inside a function in R

You need to use tidy evaluation. More info here:

Tidy evaluation book
Tidy evaluation resources

library(zoo)
library(rlang)
library(tidyverse)

dat <- structure(list(X1979 = c(1.26884, 0.75802, 0.35127, -0.0679517, 
                              -4.34841, -0.312289, -5.02931, -2.49339, -12.9065, -2.90853, 
                              -1.02833, 0.333109, 1.70236, -2.44456, -1.83307, -0.982637, -2.14197, 
                              -4.1294, -3.98545, -6.26205, -5.56162, 0.0789091, 1.63146, -0.214938 
), X1980 = c(-1.32651, -0.0199441, -1.08583, 3.25939, 0.0402712, 
             -3.22174, -0.859756, -3.30898, 1.0128, 0.847161, 2.75866, 1.93117, 
             1.05851, 1.83372, -0.811736, -0.992584, -0.110012, 0.132343, 
             2.21745, -1.48902, 0.111302, -3.77058, -3.65044, -2.41263)), class = 
  "data.frame", row.names = 50:73)

Use curly-curly {{}}

test <- function(dat, column_name){ 
  dat %>%
    rownames_to_column() %>%
    filter({{column_name}} > 0 &
             rollsum({{column_name}} > 0, 4, fill = NA, align = 
                       "left") >= 3 &
             rollsum({{column_name}}, 4, fill = NA, align = 
                       "left") > 1) %>%
    slice(1) -> result
    return(result)
}

test(dat, X1979)
#>   rowname  X1979   X1980
#> 1      50 1.2688 -1.3265

Use .data[[]] pronoun

test2 <- function(dat, column_name){ 
  dat %>%
    rownames_to_column() %>%
    filter(.data[[column_name]] > 0 &
             rollsum(.data[[column_name]] > 0, 4, fill = NA, align = 
                       "left") >= 3 &
             rollsum(.data[[column_name]], 4, fill = NA, align = 
                       "left") > 1) %>%
    slice(1) -> result
  return(result)
}

out <- colnames(dat) %>% 
  set_names %>% 
  map_dfr(~ test2(dat, .x), .id = 'Col_ID')
out
#>   Col_ID rowname    X1979   X1980
#> 1  X1979      50   1.2688 -1.3265
#> 2  X1980      58 -12.9065  1.0128

^{Created on 2020-05-05 by the reprex package (v0.3.0)}

Using Variable names for dplyr inside function

sum1 <- function(df, group_var,x,y) {

  group_var <- enquo(group_var)

  x = as.name(x)
  y = as.name(y)

  df.temp<- df %>%
    group_by(!!group_var) %>%
    mutate(
      sum = !!enquo(x)+!!enquo(y)
    )

  return(df.temp)
}

sum1(df, g1, A.Key, B.Key)
# A tibble: 5 x 4
# Groups:   g1 [5]
     g1     a     b   sum
  <dbl> <int> <int> <int>
1    1.     3     2     5
2    2.     2     1     3
3    3.     1     3     4
4    4.     4     4     8
5    5.     5     5    10

R: Using dplyr inside a function. exception in eval(expr, envir, enclos): unknown column

This is the problem with functions using NSE (non-standard evaluation). Functions using NSE are very useful in interactive programming but cause many problems in development i.e. when you try to use those inside other functions. Due to expressions not being evaluated directly, R is not able to find the objects in the environments it looks in. I can suggest you read here and preferably the scoping issues chapter for more info.

First of all you need to know that ALL the standard dplyr functions use NSE. Let's see an approximate example to your problem:

Data:

df <- data.frame(col1 = rep(c('a','b'), each=5), col2 = runif(10))

> df
   col1       col2
1     a 0.03366446
2     a 0.46698763
3     a 0.34114682
4     a 0.92125387
5     a 0.94511394
6     b 0.67241460
7     b 0.38168131
8     b 0.91107090
9     b 0.15342089
10    b 0.60751868

Let's see how NSE will make our simple problem crush:

First of all the simple interactive case works:

df %>% group_by(col1) %>% summarise(count = n())

Source: local data frame [2 x 2]

  col1 count
1    a     5
2    b     5

Let's see what happens if I put it in a function:

lets_group <- function(column) {
  df %>% group_by(column) %>% summarise(count = n())
}

>lets_group(col1)
Error: index out of bounds

Not the same error as yours but it is caused by NSE. Exactly the same line of code worked outside the function.

Fortunately, there is a solution to your problem and that is standard evaluation. Hadley also made versions of all the functions in dplyr that use standard evaluation. They are just the normal functions plus the _ underscore at the end.

Now look at how this will work:

#notice the formula operator (~) at the function at summarise_
lets_group2 <- function(column) {
  df %>% group_by_(column) %>% summarise_(count = ~n())
}

This yields the following result:

#also notice the quotes around col1
> lets_group2('col1')
Source: local data frame [2 x 2]

  col1 count
1    a     5
2    b     5

I cannot test your problem but using SE instead of NSE will give you the results you want. For more info you can also read here

Using dplyr within a function, Grouping Error with function arguments

From the NSE vignette:

If you also want to output variables to vary, you need to pass a list
of quoted objects to the .dots argument:

Here, variable should be quoted:

subgroup_analysis <- function(database,...){

  df <- database %>% 
    select(diamond, subgroup_column, x,y,z) %>% 
    melt(id.vars=c("diamond", subgroup_name)) %>% 
    group_by_(subgroup_name, quote(variable)) %>% 
    summarise(value = round(mean(value, na.rm = TRUE),2))
  print(df)
}

subgroup_analysis(database, subgroup_column, subgroup_name)

As mentionned by @RichardScriven, if you plan to assign the result to a new variable, then you may want to remove the print call at the end and just write df, or not even assign df at all in the function

Otherwise the result prints even when you do x <- subgroup_analysis(...)

Error When Using Dplyr Inside of a Function