How to parametrize function calls in dplyr 0.7?
dplyr
will have a specialized group_by function group_by_at
to deal with multiple grouping variables. It would be much easier to use the new member of the _at
family:
# using the pre-release 0.6.0
cols <- c("am","gear")
mtcars %>%
group_by_at(.vars = cols) %>%
summarise(mean_cyl=mean(cyl))
# Source: local data frame [4 x 3]
# Groups: am [?]
#
# am gear mean_cyl
# <dbl> <dbl> <dbl>
# 1 0 3 7.466667
# 2 0 4 5.000000
# 3 1 4 4.500000
# 4 1 5 6.000000
The .vars
argument accepts both character/numeric vector or column names generated by vars
:
.vars
A list of columns generated by vars(), or a character vector of
column names, or a numeric vector of column positions.
If/else condition in dplyr 0.7 function
The problem here is that when you supply unquoted arguments, is.null
doesn't know what to do with it. So this code tries to check whether object B
is null and errors because B
does not exist in that scope. Instead, you can use missing()
to check whether an argument was supplied to the function, like so. There may be a cleaner way but this at least works, as you can see at the bottom.
library(tidyverse)
test <- tibble(
A = c(1:5,1:5),
B = c(1,2,1,2,3,3,3,3,3,3),
C = c(1,1,1,1,2,3,4,5,4,3)
)
# begin function, set default for group var to NULL.
prop_tab <- function(df, column, group) {
col_name <- enquo(column)
group_name <- enquo(group)
# if group_by var is not supplied, then:
if(!missing(group)) {
temp <- df %>%
select(!!col_name, !!group_name) %>%
group_by(!!group_name) %>%
summarise(Percentages = 100 * length(!!col_name) / nrow(df))
} else {
# if group_by var is null, then...
temp <- df %>%
select(!!col_name) %>%
group_by(col_name = !!col_name) %>%
summarise(Percentages = 100 * length(!!col_name) / nrow(df))
}
temp
}
test %>% prop_tab(column = C) # works
#> # A tibble: 5 x 2
#> col_name Percentages
#> <dbl> <dbl>
#> 1 1 40
#> 2 2 10
#> 3 3 20
#> 4 4 20
#> 5 5 10
test %>% prop_tab(column = A, group = B)
#> # A tibble: 3 x 2
#> B Percentages
#> <dbl> <dbl>
#> 1 1 20
#> 2 2 20
#> 3 3 60
Created on 2018-06-29 by the reprex package (v0.2.0).
How to pass strings denoting expressions to dplyr 0.7 verbs?
It's important to note that, in this simple example, we have control of how the expressions are created. So the best way to pass the expressions is to construct and pass quosures directly using quos()
:
library(tidyverse)
library(rlang)
group_by_and_tally <- function(data, groups) {
data %>%
group_by(UQS(groups)) %>%
tally()
}
my_groups <- quos(2 * cyl, am)
mtcars %>%
group_by_and_tally(my_groups)
#> # A tibble: 6 x 3
#> # Groups: 2 * cyl [?]
#> `2 * cyl` am n
#> <dbl> <dbl> <int>
#> 1 8 0 3
#> 2 8 1 8
#> 3 12 0 4
#> 4 12 1 3
#> 5 16 0 12
#> 6 16 1 2
However, if we receive the expressions from an outside source in the form of strings, we can simply parse the expressions first, which converts them to quosures:
my_groups <- c('2 * cyl', 'am')
my_groups <- my_groups %>% map(parse_quosure)
mtcars %>%
group_by_and_tally(my_groups)
#> # A tibble: 6 x 3
#> # Groups: 2 * cyl [?]
#> `2 * cyl` am n
#> <dbl> <dbl> <int>
#> 1 8 0 3
#> 2 8 1 8
#> 3 12 0 4
#> 4 12 1 3
#> 5 16 0 12
#> 6 16 1 2
Again, we should only do this if we are getting expressions from an outside source that provides them as strings - otherwise we should make quosures directly in the R source code.
dplyr 0.7 - Specify grouping variable as string
Either of these options are probably simpler:
my_summarise <- function(df, group_var) {
print(group_var)
df %>%
#Either works
group_by_at(.vars = group_var) %>%
#group_by(!!sym(group_var)) %>%
summarise(a = mean(a))
}
my_summarise(df,someString)
my_plot <- function(df, group_var) {
print(group_var)
ggplot(data = df %>%
group_by_at(.vars = group_var) %>%
#group_by(!!sym(group_var)) %>%
summarise(a = mean(a)),
aes_string(x = group_var, y = "a")) +
geom_bar(stat = "identity")
}
my_plot(df, someString)
...where you could use either group_by
or group_by_at
.
dplyr 0.7 equivalent for deprecated mutate_
To expand a little bit on MrFlick's example, let's assume you have a number of instructions stored as strings, as well as the corresponding names that you want to assign to the resulting computations:
ln <- list( "test2", "test3" )
lf <- list( "substr(test, 1, 5)", "substr(test, 5, 5)" )
Match up names to their instructions and convert everything to quosures:
ll <- setNames( lf, ln ) %>% lapply( rlang::parse_quosure )
As per aosmith's suggestion, the entire list can now be passed to mutate, using the special !!!
operator:
tibble( test = "test@test" ) %>% mutate( !!! ll )
# # A tibble: 1 x 3
# test test2 test3
# <chr> <chr> <chr>
# 1 test@test test@ @
Grouping on multiple programmatically specified vars in dplyr 0.6
There was a pretty similar question: Programming with dplyr using string as input. I just modified the answer a bit to use syms
and !!!
.
library(rlang)
f <- function(x){
group_by(mtcars, !!!syms(x))
}
f(c("cyl")) %>% summarise(n())
# A tibble: 3 x 2
cyl `n()`
<dbl> <int>
1 4 11
2 6 7
3 8 14
f(c("cyl", "gear")) %>% summarise(n())
# A tibble: 8 x 3
# Groups: cyl [?]
cyl gear `n()`
<dbl> <dbl> <int>
1 4 3 1
2 4 4 8
3 4 5 2
4 6 3 2
5 6 4 4
6 6 5 1
7 8 3 12
8 8 5 2
Creating dplyr function that can tell if variable input is a string or a symbol
my_summarise <- function(df, group_var) {
group_var <- substitute(group_var)
if(!is.name(group_var)) group_var <- as.name(group_var) # instead of is.name and as.name you can use is.symbol and as.symbol or a mixture.
group_var <- enquo(group_var)
df %>% group_by(!! group_var) %>%
summarise(a = mean(a))
}
You can also ignore the if
condition altogether :
my_summarise <- function(df, group_var) {
group_var<- as.name(substitute(group_var))
group_var <- enquo(group_var)
df %>% group_by(!! group_var) %>%
summarise(a = mean(a))
}
In R, how can I use a quoting function inside another function?
We could use {{}}
for column names:
fun2 <- function(df, x, ...){
out2 <- fun1(df = df, x={{x}}, ...)
return(out2)
}
cyl disp hp mpg
1 4 108 93 22.8
2 6 160 110 42.0
3 6 225 105 18.1
4 6 258 110 21.4
5 8 360 175 18.7
How to pass database query to strings using dplyr filter function
collect()
will return an object of class data.frame
which is a table that can not be converted into a character vector implicitly. Instead of as.character()
, you can do write_csv("query_result.csv")
to save the received table into a file or pull(col1) %>% as.character()
to get a character vector of the column named col1
.
How can I pass a vector as variable arguments into a function in R
Here's a small example of how to accomplish that. You pass in a string of args, we use syms
from rlang
to turn that into a list of symbols. We then use the !!!
unquote-splice operator to group by those symbols.
library(rlang)
library(dplyr)
fun <- function(df, args){
by <- syms(args)
df %>%
group_by(!!!by) %>%
summarize_all(mean)
}
Using this example with mtcars
:
> fun(mtcars, c("cyl"))
# A tibble: 3 x 11
cyl mpg disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 4.00 26.7 105 82.6 4.07 2.29 19.1 0.909 0.727 4.09 1.55
2 6.00 19.7 183 122 3.59 3.12 18.0 0.571 0.429 3.86 3.43
3 8.00 15.1 353 209 3.23 4.00 16.8 0 0.143 3.29 3.50
Related Topics
Handling Dates When We Switch to Daylight Savings Time and Back
Understanding Dates and Plotting a Histogram with Ggplot2 in R
How to Directly Select the Same Column from All Nested Lists Within a List
How to Call a Function Using the Character String of the Function Name in R
What Is the Meaning of the Dollar Sign "$" in R Function()
Re-Ordering Bars in R's Barplot()
Why Does "One" < 2 Equal False in R
How to Save Data File into .Rdata
Write List of Data.Frames to Separate CSV Files with Lapply
R Shiny Set Datatable Column Width
Why Is Using '<<-' Frowned Upon and How to Avoid It
Plotting a 3D Surface Plot with Contour Map Overlay, Using R
Using Dynamic Column Names in 'Data.Table'
How to Insert an Image into the Navbar on a Shiny Navbarpage()
There Is Pmin and Pmax Each Taking Na.Rm, Why No Psum
Subsetting a Data Frame Based on Contents of Another Data Frame
How to Use Functions in One R Package Masked by Another Package