Non-Standard Evaluation (Nse) in Dplyr's Filter_ & Pulling Data from MySQL

Non-standard evaluation (NSE) in dplyr's filter_ & pulling data from MySQL

It's not really related to SQL. This example in R does not work either:

df <- data.frame(
v1 = sample(5, 10, replace = TRUE),
v2 = sample(5,10, replace = TRUE)
)
df %>% filter_(~ "v1" == 1)

It does not work because you need to pass to filter_ the expression ~ v1 == 1 — not the expression ~ "v1" == 1.

To solve the problem, simply use the quoting operator quo and the dequoting operator !!

library(dplyr)
which_column = quot(v1)
df %>% filter(!!which_column == 1)

Converting a summary function to non-standard evaluation NSE in dplyr

I don't really quite get why the calculation of the SE and CI are more complicated than what you were doing already.

I used the ... arguments to capture your grouping arguments, as that seems a bit easier in use.

Overall I end up with the following function:

summarySE <- function(.data, measure, ..., conf.int = 0.95) {
dots <- lazyeval::lazy_dots(...)
measure <- lazyeval::lazy(measure)

summary_dots <- list(
N = ~ n(),
mean = lazyeval::interp(~ mean(var, na.rm = T), var = measure),
sd = lazyeval::interp(~ sd(var, na.rm = T), var = measure),
se = ~ sd / sqrt(N),
ci = ~ se * qt(conf.int / 2 + 0.50, N - 1))

.data <- dplyr::group_by_(.data, .dots = dots)
dplyr::summarise_(.data, .dots = summary_dots)
}

You could make this into an SE and NSE version if you'd like (and as Hadley might do).

Usage:

summarySE(tg, len, supp, dose)

Source: local data frame [6 x 7]
Groups: supp [?]

supp dose N mean sd se ci
(fctr) (dbl) (int) (dbl) (dbl) (dbl) (dbl)
1 OJ 0.5 10 13.23 4.459709 1.4102837 3.190283
2 OJ 1.0 10 22.70 3.910953 1.2367520 2.797727
3 OJ 2.0 10 26.06 2.655058 0.8396031 1.899314
4 VC 0.5 10 7.98 2.746634 0.8685620 1.964824
5 VC 1.0 10 16.77 2.515309 0.7954104 1.799343
6 VC 2.0 10 26.14 4.797731 1.5171757 3.432090

how can dplyr functions discriminate columns and variables with the same name?

We can use .GlobalEnv

mtcars %>%
filter(mpg == .GlobalEnv$mpg)
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#2 21 6 160 110 3.9 2.875 17.02 0 1 4 4

Creating a function with an argument passed to dplyr::filter what is the best way to work around nse?

The answer from @eddi is correct about what's going on here.
I'm writing another answer that addresses the larger request of how to write functions using dplyr verbs. You'll note that, ultimately, it uses something like nrowspecies2 to avoid the species == species tautology.

To write a function wrapping dplyr verb(s) that will work with NSE, write two functions:

First write a version that requires quoted inputs, using lazyeval and
an SE version of the dplyr verb. So in this case, filter_.

nrowspecies_robust_ <- function(data, species){ 
species_ <- lazyeval::as.lazy(species)
condition <- ~ species == species_ # *
tmp <- dplyr::filter_(data, condition) # **
nrow(tmp)
}
nrowspecies_robust_(iris, ~versicolor)

Second make a version that uses NSE:

nrowspecies_robust <- function(data, species) { 
species <- lazyeval::lazy(species)
nrowspecies_robust_(data, species)
}
nrowspecies_robust(iris, versicolor)

* = if you want to do something more complex, you may need to use lazyeval::interp here as in the tips linked below

** = also, if you need to change output names, see the .dots argument

  • For the above, I followed some tips from Hadley

  • Another good resource is the dplyr vignette on NSE, which illustrates .dots, interp, and other functions from the lazyeval package

  • For even more details on lazyeval see it's vignette

  • For a thorough discussion of the base R tools for working with NSE (many of which lazyeval helps you avoid), see the chapter on NSE in Advanced R

standard evaluation in dplyr: summarise a variable given as a character string

dplyr 1.0 has changed pretty much everything about this question as well as all of the answers. See the dplyr programming vignette here:

https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html

The new way to refer to columns when their identifier is stored as a character vector is to use the .data pronoun from rlang, and then subset as you would in base R.

library(dplyr)

key <- "v3"
val <- "v2"
drp <- "v1"

df <- tibble(v1 = 1:5, v2 = 6:10, v3 = c(rep("A", 3), rep("B", 2)))

df %>%
select(-matches(drp)) %>%
group_by(.data[[key]]) %>%
summarise(total = sum(.data[[val]], na.rm = TRUE))

#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 2 x 2
#> v3 total
#> <chr> <int>
#> 1 A 21
#> 2 B 19

If your code is in a package function, you can @importFrom rlang .data to avoid R check notes about undefined globals.



Related Topics



Leave a reply



Submit