Pass SQL Functions in Dplyr Filter Function on Database

Pass SQL functions in dplyr filter function on database



A "dplyr-only" solution would be this

tbl(my_con, "my_table") %>% 
filter(batch_name %like% "batch_A_%") %>%
collect()

Full reprex:

suppressPackageStartupMessages({
library(dplyr)
library(dbplyr)
library(RPostgreSQL)
})

my_con <-
dbConnect(
PostgreSQL(),
user = "my_user",
password = "my_password",
host = "my_host",
dbname = "my_db"
)

my_table <- tribble(
~batch_name, ~value,
"batch_A_1", 1,
"batch_A_2", 2,
"batch_A_2", 3,
"batch_B_1", 8,
"batch_B_2", 9
)

copy_to(my_con, my_table)

tbl(my_con, "my_table") %>%
filter(batch_name %like% "batch_A_%") %>%
collect()
#> # A tibble: 3 x 2
#> batch_name value
#> * <chr> <dbl>
#> 1 batch_A_1 1
#> 2 batch_A_2 2
#> 3 batch_A_2 3

dbDisconnect(my_con)
#> [1] TRUE

This works because any functions that dplyr doesn't know how to
translate will be passed along as is, see
?dbplyr::translate\_sql.

Hat-tip to @PaulRougieux for his recent comment
here

How to pass database query to strings using dplyr filter function

collect() will return an object of class data.frame which is a table that can not be converted into a character vector implicitly. Instead of as.character(), you can do write_csv("query_result.csv") to save the received table into a file or pull(col1) %>% as.character() to get a character vector of the column named col1.

In dplyr how does the sql builder work?

Maybe the thing that is confusing you is that the dplyr functions tbl and filter don't actually send any code to the database for execution. When you run

tbl(con, "table1") %>% filter(col1 > 12)

what is returned is a tbl_dbi object that contains a sql query. When you run this line of code interactively in R the returned tbl_dbi object is then passed to the print function. In order for the tbl_dbi to be printed the query must be executed in the database. You can see this by saving the output to a variable.

q <- tbl(con, "table1") %>% filter(col1 > 12)
class(q)

In the above two lines nothing was sent to the database. The tbl function returned a tbl_dbi object and filter modified that tbl_dbi object. Finally the result was saved to the variable q.
When we print q then the SQL is sent to the database. So the tbl function does not need to know about any other dplyr functions that are called after it (like filter in this case). It behaves the same no matter what. It always returns a tbl_dbi object.

Now how dbplyr builds up more complex queries from simpler ones is beyond me.

Here is some code that implements your example.

library(dplyr)

shoppingList <- function(x){
stopifnot(is.character(x))
class(x) <- c("first", "shoppingList", class(x))
x
}

item <- function(x, y){
if("first" %in% class(x)){
out <- paste(x, y)
} else {
out <- paste0(x, " and ", y)
}
class(out) <- c("shoppingList", class(out))
out
}

print.shoppingList <- function(x){
# code that only runs when we print an object of class shoppingList
if("first" %in% class(x)) x <- paste(x, "nothing")
print(paste0("***", x, "***"))
}

shoppingList("I need to get")
#> [1] "***I need to get nothing***"

shoppingList("I need to get") %>% item("apples") %>% item("oranges")
#> [1] "***I need to get apples and oranges***"

But how does print know to send SQL to the database? My (oversimplified) conceptual answer is that print is a generic function that will behave differently depending on the class of object passed in. There are actually many print functions. In the example above I created a special print function for objects of class shoppingList. You could imagine a special print.tbl_dbi function that knows how to handle tbl_dbi objects by sending the the query they contain to the database they connect to and then printing the result. I think the actual implementation is more complicated but hopefully this provides some intuition.

Non-standard evaluation (NSE) in dplyr's filter_ & pulling data from MySQL

It's not really related to SQL. This example in R does not work either:

df <- data.frame(
v1 = sample(5, 10, replace = TRUE),
v2 = sample(5,10, replace = TRUE)
)
df %>% filter_(~ "v1" == 1)

It does not work because you need to pass to filter_ the expression ~ v1 == 1 — not the expression ~ "v1" == 1.

To solve the problem, simply use the quoting operator quo and the dequoting operator !!

library(dplyr)
which_column = quot(v1)
df %>% filter(!!which_column == 1)

How to build a wrapper function for querying database using dbplyr and dplyr, having the query vary

@Waldi hits on the crux of the problem, which is the pipe expects a function not an expression as the rhs. In the specific/choose from a list case, you control the expression building so this is manageable. You can use magrittr semantics and the dot placeholder to build from kind_of_query. This in turn can be used to create the complete expression (query) with rlang::quo and the !! operator.

get_data_from_db <- function(kind_of_query) {

con <- DBI::dbConnect(RSQLite::SQLite(), filename = ":memory:")
on.exit(DBI::dbDisconnect(con))
mtcars_db <- dplyr::copy_to(con, mtcars)

if (kind_of_query == "from_hadley_book") {
my_query <-
rlang::expr(
{
filter(., cyl > 2) %>%
select(mpg:hp) %>%
head(10)
}
)
}

if (kind_of_query == "mins_for_mpg_disp_drat") {
my_query <-
rlang::expr(
{summarise(., min_mpg = min(mpg), min_disp = min(disp), min_drat = min(drat))}
)
}

query <- quo(
mtcars_db %>%
!!my_query %>%
collect()
)

eval_tidy(query)

}

This is actually an overly sophisticated approach. If you're writing the expression for the kind_of_query, you might as well just simplify it by writing a function.

get_data_from_db2 <- function(kind_of_query) {

con <- DBI::dbConnect(RSQLite::SQLite(), filename = ":memory:")
on.exit(DBI::dbDisconnect(con))
mtcars_db <- dplyr::copy_to(con, mtcars)

if (kind_of_query == "from_hadley_book") {
my_fx <- function(x){
x %>%
filter(cyl > 2) %>%
select(mpg:hp) %>%
head(10)
}
}

if (kind_of_query == "mins_for_mpg_disp_drat") {
my_fx <- function(x){
summarise(x, min_mpg = min(mpg), min_disp = min(disp), min_drat = min(drat))
}
}

mtcars_db %>%
my_fx %>%
collect()

}

The problem comes with the general case. In the current proposed interface, you are trying to inject an argument value into a user-defined expression. The !! operator forces evaluation so when building the new expression, the user expression is inserted within () to force its evaluation before anything is passed from the lhs of the pipe. Manipulating the expression then likely requires deparse as suggested by @Waldi or some low level manipulation of the abstract syntax tree.

The simpler solution, if possible, would be to have your users pass in a function, similar to purrr::map or lapply. This would drastically simplify the function implementation

get_data_from_db_general <- function(kind_of_query) {

con <- DBI::dbConnect(RSQLite::SQLite(), filename = ":memory:")
on.exit(DBI::dbDisconnect(con))
mtcars_db <- dplyr::copy_to(con, mtcars)

mtcars_db %>%
kind_of_query %>%
collect()
}

get_data_from_db_general(
kind_of_query = function(x){
x %>%
filter(cyl > 2) %>%
select(mpg:hp) %>%
head(10)
}
)

# A tibble: 10 x 4
mpg cyl disp hp
<dbl> <dbl> <dbl> <dbl>
1 21 6 160 110
2 21 6 160 110
3 22.8 4 108 93
4 21.4 6 258 110
5 18.7 8 360 175
6 18.1 6 225 105
7 14.3 8 360 245
8 24.4 4 147. 62
9 22.8 4 141. 95
10 19.2 6 168. 123



Related Topics



Leave a reply



Submit