Non-standard evaluation (NSE) in dplyr's filter_ & pulling data from MySQL
It's not really related to SQL. This example in R does not work either:
df <- data.frame(
v1 = sample(5, 10, replace = TRUE),
v2 = sample(5,10, replace = TRUE)
)
df %>% filter_(~ "v1" == 1)
It does not work because you need to pass to filter_
the expression ~ v1 == 1
— not the expression ~ "v1" == 1
.
To solve the problem, simply use the quoting operator quo
and the dequoting operator !!
library(dplyr)
which_column = quot(v1)
df %>% filter(!!which_column == 1)
Converting a summary function to non-standard evaluation NSE in dplyr
I don't really quite get why the calculation of the SE and CI are more complicated than what you were doing already.
I used the ...
arguments to capture your grouping arguments, as that seems a bit easier in use.
Overall I end up with the following function:
summarySE <- function(.data, measure, ..., conf.int = 0.95) {
dots <- lazyeval::lazy_dots(...)
measure <- lazyeval::lazy(measure)
summary_dots <- list(
N = ~ n(),
mean = lazyeval::interp(~ mean(var, na.rm = T), var = measure),
sd = lazyeval::interp(~ sd(var, na.rm = T), var = measure),
se = ~ sd / sqrt(N),
ci = ~ se * qt(conf.int / 2 + 0.50, N - 1))
.data <- dplyr::group_by_(.data, .dots = dots)
dplyr::summarise_(.data, .dots = summary_dots)
}
You could make this into an SE and NSE version if you'd like (and as Hadley might do).
Usage:
summarySE(tg, len, supp, dose)
Source: local data frame [6 x 7]
Groups: supp [?]
supp dose N mean sd se ci
(fctr) (dbl) (int) (dbl) (dbl) (dbl) (dbl)
1 OJ 0.5 10 13.23 4.459709 1.4102837 3.190283
2 OJ 1.0 10 22.70 3.910953 1.2367520 2.797727
3 OJ 2.0 10 26.06 2.655058 0.8396031 1.899314
4 VC 0.5 10 7.98 2.746634 0.8685620 1.964824
5 VC 1.0 10 16.77 2.515309 0.7954104 1.799343
6 VC 2.0 10 26.14 4.797731 1.5171757 3.432090
how can dplyr functions discriminate columns and variables with the same name?
We can use .GlobalEnv
mtcars %>%
filter(mpg == .GlobalEnv$mpg)
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#2 21 6 160 110 3.9 2.875 17.02 0 1 4 4
Creating a function with an argument passed to dplyr::filter what is the best way to work around nse?
The answer from @eddi is correct about what's going on here.
I'm writing another answer that addresses the larger request of how to write functions using dplyr
verbs. You'll note that, ultimately, it uses something like nrowspecies2
to avoid the species == species
tautology.
To write a function wrapping dplyr verb(s) that will work with NSE, write two functions:
First write a version that requires quoted inputs, using lazyeval
and
an SE version of the dplyr
verb. So in this case, filter_
.
nrowspecies_robust_ <- function(data, species){
species_ <- lazyeval::as.lazy(species)
condition <- ~ species == species_ # *
tmp <- dplyr::filter_(data, condition) # **
nrow(tmp)
}
nrowspecies_robust_(iris, ~versicolor)
Second make a version that uses NSE:
nrowspecies_robust <- function(data, species) {
species <- lazyeval::lazy(species)
nrowspecies_robust_(data, species)
}
nrowspecies_robust(iris, versicolor)
* = if you want to do something more complex, you may need to use lazyeval::interp
here as in the tips linked below
** = also, if you need to change output names, see the .dots
argument
For the above, I followed some tips from Hadley
Another good resource is the dplyr vignette on NSE, which illustrates
.dots
,interp
, and other functions from thelazyeval
packageFor even more details on lazyeval see it's vignette
For a thorough discussion of the base R tools for working with NSE (many of which
lazyeval
helps you avoid), see the chapter on NSE in Advanced R
standard evaluation in dplyr: summarise a variable given as a character string
dplyr
1.0 has changed pretty much everything about this question as well as all of the answers. See the dplyr
programming vignette here:
https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html
The new way to refer to columns when their identifier is stored as a character vector is to use the .data
pronoun from rlang
, and then subset as you would in base R.
library(dplyr)
key <- "v3"
val <- "v2"
drp <- "v1"
df <- tibble(v1 = 1:5, v2 = 6:10, v3 = c(rep("A", 3), rep("B", 2)))
df %>%
select(-matches(drp)) %>%
group_by(.data[[key]]) %>%
summarise(total = sum(.data[[val]], na.rm = TRUE))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 2 x 2
#> v3 total
#> <chr> <int>
#> 1 A 21
#> 2 B 19
If your code is in a package function, you can @importFrom rlang .data
to avoid R check notes about undefined globals.
Related Topics
Display a Time Clock in the R Command Line
Missing Legend with Ggplot2 and Geom_Line
Subsetting a Data.Table Using !=<Some Non-Na> Excludes Na Too
Remove All of X Axis Labels in Ggplot
Element-Wise Mean Over List of Matrices
Update Subset of Data.Table Based on Join
R Error in X$Ed:$ Operator Is Invalid for Atomic Vectors
Creating Dummy Variables in R Data.Table
How to Draw a Line Across a Multiple-Figure Environment in R
Examples of the Perils of Globals in R and Stata
Logical Operators (And, Or) with Na, True and False
Function to Calculate R2 (R-Squared) in R
Finding Point of Intersection in R
Getting Strings Recognized as Variable Names in R
Convert a Numeric Month to a Month Abbreviation
How to Get Name of Variable in R (Substitute)
Position of the Sun Given Time of Day, Latitude and Longitude