How to Generate SQL from Dbplyr Without a Database Connection

Saving dbplyr query (tbl_sql object) to MySQL without saving data locally

Creating a table with INTO command is an SQL Server (even MS Access) specific syntax and not supported in MySQL. Instead, consider the counterpart statement: CREATE TABLE...SELECT. Also, schema differs between RDBMS's. For MySQL, database is synonymous to schema.

Therefore, consider adjusted version of SQL build:

sql_query <- glue::glue(
"CREATE TABLE {db}.{tbl_name}\n AS \n",
"SELECT * \n",
"FROM (\n",
dbplyr::sql_render(input_tbl),
"\n) AS sub_query"
)

Problem using dbplyr to create a SQL query

show_query() will only work on a database, and you are trying to use it on a dataframe. To send your data from the csv to a temporary database object to create the query, you could use tbl_memdb() and instead do:

data %>% 
tbl_memdb() %>%
filter(...) %>%
mutate(...) %>%
show_query()

dbplyr generating unexpected SQL query

dbplyr is generating the SQL query as I would expect. What it has done is one query inside another:

SELECT id, date, type FROM myTable

Is a subquery in the super query

SELECT *
FROM (
subquery
) q01
WHERE type = foobar

The q01 is the name given to the subquery. In the same way as the AS keyword. For example: FROM very_long_table_name AS VLTN.

Yes, this nesting is ugly. But many SQL engines have a query optimizer that calculates the best way to execute a query. On SQL Server, I have noticed little difference in performance because the query optimizer finds a faster way to execute than as written.

However, it appears that for MySQL, nested queries are known to result in slower performance. See here, here, and here.

One thing that might solve this is changing the order of the select and filter commands in R:

tab %>%
filter(type = 'foobar') %>%
select(id, date, type)

Will probably produce the translated query:

SELECT `id`, `date`, `type`
FROM `myTable`
WHERE (`type` == 'foobar')

Which will perform better.

Connect to a DB using DBplyr

In the example you have linked to, mtcars is a table in datawarehouse. I am going to assume mtcars is in the database you are connecting to. But you can check for this using:

'mtcars' %in% DBI::dbListTables(con)

If you want to query a table in a specific database or schema (not the default) then you need to use in_schema.

Without in_schema:

tbl(con, 'dbo.mtcars')

Produces an sql query like:

SELECT *
FROM "dbo.mtcars"

Where the " delimit names. So in this case SQL is looking for a table named dbo.mtcars not a table named mtcars in dbo.

With in_schema:

tbl(con, in_schema('dbo','mtcars'))

Produces an sql query like:

SELECT *
FROM "dbo"."mtcars"

So in this case SQL is looking for a table named mtcars in dbo. Because each term is " quoted separately.

How to solve error no applicable method for 'show_query' applied to an object of class data.frame

show_query() translates the dplyr syntax into query code for the backend you are using.

A database backend using dbplyr will result in an SQL query (as a data.table backend using dtplyr will result in a DT[i,j,by] query).

show_query doesn't need to have a method to translate dplyr syntax applied to a data.frame backend to itself, hence the error message you're getting.

An easy way to get an SQL query result is to transform the data.frame into an in-memory database with memdb_frame:

memdb_frame(iris) %>% 
filter(Species == "setosa") %>%
summarise(mean.Sepal.Length = mean(Sepal.Length),
mean.Petal.Length = mean(Petal.Length)) %>% show_query()

<SQL>
SELECT AVG(`Sepal.Length`) AS `mean.Sepal.Length`, AVG(`Petal.Length`) AS `mean.Petal.Length`
FROM `dbplyr_002`
WHERE (`Species` = 'setosa')


Related Topics



Leave a reply



Submit