How to Parametrize Function Calls in Dplyr 0.7

How to parametrize function calls in dplyr 0.7?

dplyr will have a specialized group_by function group_by_at to deal with multiple grouping variables. It would be much easier to use the new member of the _at family:

# using the pre-release 0.6.0

cols <- c("am","gear")

mtcars %>%
group_by_at(.vars = cols) %>%
summarise(mean_cyl=mean(cyl))

# Source: local data frame [4 x 3]
# Groups: am [?]
#
# am gear mean_cyl
# <dbl> <dbl> <dbl>
# 1 0 3 7.466667
# 2 0 4 5.000000
# 3 1 4 4.500000
# 4 1 5 6.000000

The .vars argument accepts both character/numeric vector or column names generated by vars:

.vars

A list of columns generated by vars(), or a character vector of
column names, or a numeric vector of column positions.

If/else condition in dplyr 0.7 function

The problem here is that when you supply unquoted arguments, is.null doesn't know what to do with it. So this code tries to check whether object B is null and errors because B does not exist in that scope. Instead, you can use missing() to check whether an argument was supplied to the function, like so. There may be a cleaner way but this at least works, as you can see at the bottom.

library(tidyverse)
test <- tibble(
A = c(1:5,1:5),
B = c(1,2,1,2,3,3,3,3,3,3),
C = c(1,1,1,1,2,3,4,5,4,3)
)

# begin function, set default for group var to NULL.
prop_tab <- function(df, column, group) {

col_name <- enquo(column)
group_name <- enquo(group)

# if group_by var is not supplied, then:
if(!missing(group)) {
temp <- df %>%
select(!!col_name, !!group_name) %>%
group_by(!!group_name) %>%
summarise(Percentages = 100 * length(!!col_name) / nrow(df))

} else {
# if group_by var is null, then...
temp <- df %>%
select(!!col_name) %>%
group_by(col_name = !!col_name) %>%
summarise(Percentages = 100 * length(!!col_name) / nrow(df))

}

temp
}

test %>% prop_tab(column = C) # works
#> # A tibble: 5 x 2
#> col_name Percentages
#> <dbl> <dbl>
#> 1 1 40
#> 2 2 10
#> 3 3 20
#> 4 4 20
#> 5 5 10

test %>% prop_tab(column = A, group = B)
#> # A tibble: 3 x 2
#> B Percentages
#> <dbl> <dbl>
#> 1 1 20
#> 2 2 20
#> 3 3 60

Created on 2018-06-29 by the reprex package (v0.2.0).

How to pass strings denoting expressions to dplyr 0.7 verbs?

It's important to note that, in this simple example, we have control of how the expressions are created. So the best way to pass the expressions is to construct and pass quosures directly using quos():

library(tidyverse)
library(rlang)

group_by_and_tally <- function(data, groups) {
data %>%
group_by(UQS(groups)) %>%
tally()
}

my_groups <- quos(2 * cyl, am)
mtcars %>%
group_by_and_tally(my_groups)
#> # A tibble: 6 x 3
#> # Groups: 2 * cyl [?]
#> `2 * cyl` am n
#> <dbl> <dbl> <int>
#> 1 8 0 3
#> 2 8 1 8
#> 3 12 0 4
#> 4 12 1 3
#> 5 16 0 12
#> 6 16 1 2

However, if we receive the expressions from an outside source in the form of strings, we can simply parse the expressions first, which converts them to quosures:

my_groups <- c('2 * cyl', 'am')
my_groups <- my_groups %>% map(parse_quosure)
mtcars %>%
group_by_and_tally(my_groups)
#> # A tibble: 6 x 3
#> # Groups: 2 * cyl [?]
#> `2 * cyl` am n
#> <dbl> <dbl> <int>
#> 1 8 0 3
#> 2 8 1 8
#> 3 12 0 4
#> 4 12 1 3
#> 5 16 0 12
#> 6 16 1 2

Again, we should only do this if we are getting expressions from an outside source that provides them as strings - otherwise we should make quosures directly in the R source code.

dplyr 0.7 - Specify grouping variable as string

Either of these options are probably simpler:

my_summarise <- function(df, group_var) {
print(group_var)

df %>%
#Either works
group_by_at(.vars = group_var) %>%
#group_by(!!sym(group_var)) %>%
summarise(a = mean(a))
}

my_summarise(df,someString)

my_plot <- function(df, group_var) {
print(group_var)

ggplot(data = df %>%
group_by_at(.vars = group_var) %>%
#group_by(!!sym(group_var)) %>%
summarise(a = mean(a)),
aes_string(x = group_var, y = "a")) +
geom_bar(stat = "identity")
}

my_plot(df, someString)

...where you could use either group_by or group_by_at.

dplyr 0.7 equivalent for deprecated mutate_

To expand a little bit on MrFlick's example, let's assume you have a number of instructions stored as strings, as well as the corresponding names that you want to assign to the resulting computations:

ln <- list( "test2", "test3" )
lf <- list( "substr(test, 1, 5)", "substr(test, 5, 5)" )

Match up names to their instructions and convert everything to quosures:

ll <- setNames( lf, ln ) %>% lapply( rlang::parse_quosure )

As per aosmith's suggestion, the entire list can now be passed to mutate, using the special !!! operator:

tibble( test = "test@test" ) %>% mutate( !!! ll )
# # A tibble: 1 x 3
# test test2 test3
# <chr> <chr> <chr>
# 1 test@test test@ @

Grouping on multiple programmatically specified vars in dplyr 0.6

There was a pretty similar question: Programming with dplyr using string as input. I just modified the answer a bit to use syms and !!!.

library(rlang)
f <- function(x){
group_by(mtcars, !!!syms(x))
}

f(c("cyl")) %>% summarise(n())
# A tibble: 3 x 2
cyl `n()`
<dbl> <int>
1 4 11
2 6 7
3 8 14

f(c("cyl", "gear")) %>% summarise(n())
# A tibble: 8 x 3
# Groups: cyl [?]
cyl gear `n()`
<dbl> <dbl> <int>
1 4 3 1
2 4 4 8
3 4 5 2
4 6 3 2
5 6 4 4
6 6 5 1
7 8 3 12
8 8 5 2

Creating dplyr function that can tell if variable input is a string or a symbol


my_summarise <- function(df, group_var) {

group_var <- substitute(group_var)

if(!is.name(group_var)) group_var <- as.name(group_var) # instead of is.name and as.name you can use is.symbol and as.symbol or a mixture.

group_var <- enquo(group_var)

df %>% group_by(!! group_var) %>%
summarise(a = mean(a))
}

You can also ignore the if condition altogether :

my_summarise <- function(df, group_var) {

group_var<- as.name(substitute(group_var))

group_var <- enquo(group_var)

df %>% group_by(!! group_var) %>%
summarise(a = mean(a))
}

In R, how can I use a quoting function inside another function?

We could use {{}} for column names:

fun2 <- function(df, x, ...){
out2 <- fun1(df = df, x={{x}}, ...)
return(out2)
}
  cyl disp  hp  mpg
1 4 108 93 22.8
2 6 160 110 42.0
3 6 225 105 18.1
4 6 258 110 21.4
5 8 360 175 18.7

How to pass database query to strings using dplyr filter function

collect() will return an object of class data.frame which is a table that can not be converted into a character vector implicitly. Instead of as.character(), you can do write_csv("query_result.csv") to save the received table into a file or pull(col1) %>% as.character() to get a character vector of the column named col1.

How can I pass a vector as variable arguments into a function in R

Here's a small example of how to accomplish that. You pass in a string of args, we use syms from rlang to turn that into a list of symbols. We then use the !!! unquote-splice operator to group by those symbols.

library(rlang)
library(dplyr)

fun <- function(df, args){

by <- syms(args)

df %>%
group_by(!!!by) %>%
summarize_all(mean)
}

Using this example with mtcars:

> fun(mtcars, c("cyl"))
# A tibble: 3 x 11
cyl mpg disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 4.00 26.7 105 82.6 4.07 2.29 19.1 0.909 0.727 4.09 1.55
2 6.00 19.7 183 122 3.59 3.12 18.0 0.571 0.429 3.86 3.43
3 8.00 15.1 353 209 3.23 4.00 16.8 0 0.143 3.29 3.50


Related Topics



Leave a reply



Submit