Using sqldf and RPostgreSQL together
Using sqldf with RPostgreSQL
sqldf
will automatically work with the test
database in PostgreSQL if it sees that RPostgreSQL is loaded. So you can create a test
database in PostgreSQL and then use sqldf with that
or, you can specify the name of a different database.
See: sqldf FAQ 12
Using sqldf with RSQLite
If you want to use sqldf with RSQLite rather than with RPostgreSQL you can use sqldf
's drv
argument to force it use a non-default driver. e.g.
sqldf("select foo from bar...",drv="SQLite")
or, you can set the driver globally using the "sqldf.driver"
option. From within R:
options(sqldf.driver = "SQLite")
or, another possibility if you wish to use RSQLite is to detach
RPostgreSQL before you use sqldf and load it again afterwards.
See ?sqldf
for details.
Using RpostgreSQL with sqldf crashes R
If the only reason not to use H2 is date_trunc
then here are some ways around that.
1) Macros Here is a workaround for truncation to the beginning of year/quarter/month/week. These functions act as macros which expand into code accepted by H2. Be sure to prefix sqldf with fn$ and surround each with back quotes to turn on substitution. Note that in each case the argument is a string. Add the verbose=TRUE
argument to sqldf to see the generated code.
library(RH2)
library(sqldf)
trunc_year <- function(x) sprintf("DATEADD(day, 1-day_of_year(%s), %s)", x, x)
trunc_qtr <- function(x) sprintf("DATEADD(month, 3*(quarter(%s)-1), %s)", x,
trunc_year(x))
trunc_month <- function(x) sprintf("DATEADD(day, 1-day_of_month(%s), %s)", x, x)
trunc_week <- function(x) sprintf("DATEADD(day, -iso_day_of_week(%s), %s)", x, x)
# test
DF <- data.frame(x = as.Date("2021-11-15"))
fn$sqldf("select x,
`trunc_year('x')` year,
`trunc_qtr('x')` qtr,
`trunc_month('x')` month,
`trunc_week('x')` week
from DF")
## x year qtr month week
## 1 2021-11-15 2021-01-01 2021-10-01 2021-11-01 2021-11-14
2) Patch RH2 Another possibility is to patch your installation of RH2 with the newer version of H2 which has date_trunc
. To do this below we remove the the h2-1.3.175.jar file in RH2, which contains H2, and replace it with the newer version h2-1.4.200.jar. This should work as long as you have write permission in the java subdirectory of your RH2 installation. Just run this code in R and your RH2 installation will be patched.
u <- "https://h2database.com/h2-2019-10-14.zip"
z <- basename(u)
tmp <- tempdir() # create temporary dir
old.dir <- setwd(tmp) # go to it
download.file(u, z) # download u
unzip(z, "h2/bin/h2-1.4.200.jar") # extract jar from zip file
# copy new jar file to RH2 installation and remove old one
new.jar <- file.path(tmp, "h2", "bin", "h2-1.4.200.jar")
old.jar <- dir(system.file("java", package = "RH2"), "h2.*jar$", full.names = TRUE)
if (basename(old.jar) != basename(new.jar)) {
file.copy(new.jar, dirname(old.jar))
file.remove(old.jar)
}
setwd(old.dir) # return to original directory
# test
library(RH2)
library(sqldf)
sqldf("select h2version()")
## '1.4.200'
## 1 1.4.200
# see more tests in (3) below
3) Rebuild RH2 Another possibility is create a new source version of RH2 with the newer version of H2 that has date_trunc
. The first argument is 'year', 'quarter', 'month' or 'week' and the second argument is a date. Thus, if you are willing to rebuild RH2 with this new H2 version then it is available.
(a) Download the source of RH2 from https://cran.r-project.org/package=RH2 and detar it to create a directory tree RH2.
(b) Download this h2 zip https://h2database.com/h2-2019-10-14.zip and extract h2/bin/h2-1.4.200.jar from that zip file and place the jar file in the RH2/inst/java directory of the RH2 source removing the old one, h2-1.3.175.jar
(c) rebuild and test RH2 like this:
# rebuild RH2
# cd to the RH2 directory containing its DESCRIPTION file
setwd("...whatever.../RH2")
# build RH2
# on Windows this needs Rtools40 (not an R package)
# https://cran.r-project.org/bin/windows/Rtools/
library(devtools)
build()
install(args = "--no-multiarch")
# test
library(RH2)
library(sqldf)
sqldf("select h2version()") # check it's the latest version
## '1.4.200'
## 1 1.4.200
DF <- data.frame(x = as.Date("2000-11-15"))
sqldf("select date_trunc('month', x) from DF")
## DATE_TRUNC('month', x)
## 1 2000-11-01
Use sqldf to join exactly on id and on the most recent date in a lagged window
Consider window functions such as RANK()
of which likely dplyr::row_number()
is adopted (among other SQL semantics like select
, group_by
, case_when
). SQLite (default dialect of sqldf
) recently added support of window functions in version 3.25.0 (September 2018 release).
If not available in sqldf
(depending on version), use a Postgres backend via RPostgreSQL
. See author docs. Possibly too or soon, RMySQL
will be another supported backend as MySQL 8 recently added support of window functions.
library(RPostgreSQL)
library(sqldf)
D <- sqldf('WITH cte AS
(SELECT *,
RANK() OVER (PARTITION BY "B".row ORDER BY "B".date DESC) AS rn
FROM "A"
LEFT JOIN "B"
ON "A".id = "B".id
AND ("A".date - "B".date) BETWEEN 3*30 and 3*365
)
SELECT * FROM cte
WHERE rn = 1')
Can sqldf be used to import the data of an already existing table in a database into a data.frame in R?
Thanks to G. Grothendieck for the answer. Indeed it is perfectly possible to select data from already existing tables in the database. My mistake was that I was thinking that the name of the dataframe and the corresponding table must always be the same, whereas if I understand correctly, this is only the case when a data.frame data is mapped to a temporary table in the database. As a result when I tried to select data, I had an error message saying that a table with the same name already existed in my database.
Anyway, just as a test to see whether this works, I did the following in PostgreSQL (postgres user and test database which is owned by postgres)
test=# create table person(fname text, lname text, email text);
CREATE TABLE
test=# insert into person(fname, lname, email) values ('fname-01', 'lname-01', 'fname-01.lname-01@gmail.com'), ('fname-02', 'lname-02', 'fname-02.lname-02@gmail.com'), ('fname-03', 'lname-03', 'fname-03.lname-03@gmail.com');
INSERT 0 3
test=# select * from person;
fname | lname | email
----------+----------+-----------------------------
fname-01 | lname-01 | fname-01.lname-01@gmail.com
fname-02 | lname-02 | fname-02.lname-02@gmail.com
fname-03 | lname-03 | fname-03.lname-03@gmail.com
(3 rows)
test=#
Then I wrote the following in R
options(sqldf.RPostgreSQL.user = "postgres",
sqldf.RPostgreSQL.password = "postgres",
sqldf.RPostgreSQL.dbname = "test",
sqldf.RPostgreSQL.host = "localhost",
sqldf.RPostgreSQL.port = 5432)
###
###
library(tidyverse)
library(RPostgreSQL)
library(sqldf)
###
###
result_df <- sqldf("select * from person")
And indeed we can see that result_df contains the data stored in the table person.
> result_df
fname lname email
1 fname-01 lname-01 fname-01.lname-01@gmail.com
2 fname-02 lname-02 fname-02.lname-02@gmail.com
3 fname-03 lname-03 fname-03.lname-03@gmail.com
>
>
case when in sqldf / R
Known as conditional aggregation in SQL (often used for pivoting data), as @Gregor comments simply run an aggregate like MAX()
(even MIN()
will work) around the CASE
statements:
SELECT A,
MAX(CASE WHEN A = '1' AND B = '1' THEN Col ELSE NULL END) as Test_1,
MAX(CASE WHEN A = '1' AND B = '2' THEN Col ELSE NULL END) as Test_2,
MAX(CASE WHEN A = '1' AND B = '3' THEN Col ELSE NULL END) as Test_3,
MAX(CASE WHEN A = '2' AND B = '1' THEN Col ELSE NULL END) as Test_4,
MAX(CASE WHEN A = '2' AND B = '2' THEN Col ELSE NULL END) as Test_5,
MAX(CASE WHEN A = '2' AND B = '3' THEN Col ELSE NULL END) as Test_6
FROM DF
GROUP BY A
Related Topics
Oracle SQL Date Range Intersections
Using a SQL Server for Application Logging. Pros/Cons
How to Select MySQL Rows in The Order of in Clause
Why Is My Left Join Not Returning Nulls
Try_Convert Fails on SQL Server 2012
How to Set The Starting Point for The Primary Key (Id) Column in Postgres via a Rails Migration
Join Tables Using a Value Inside a JSONb Column
Aspentech Infoplus 21 - How to Connect and Query Data
Oracle SQL Order by in Subquery Problems!
Restoring a Database from .Bak File on Another Machine
Codeigniter - Continue on SQL Error
How to Perform a SQL 'Not In' Query Faster
How to Use Time-Series with Sqlite, with Fast Time-Range Queries
Oracle SQL - Max() with Null Values
Convert a Binary Stored as Varchar to Binary