Passing a DataFrame list to a WHERE clause in a SQL query embedded in R
Suppose we wish to insert the Time column from the built in data frame BOD into an sql query.
sprintf("select * from X where Y in (%s)", toString(BOD$Time))
## [1] "select * from X where Y in (1, 2, 3, 4, 5, 7)"
Passing DataFrame column into WHERE clause in SQL query embedded in R via parametrized queries
This is very close to your previous question, more formalized now in a function. There are a few things to note:
dbBind
takes three arguments, but all of the parameters to be bound must be in position 2; position 3 (the ellipses) is any argument(s) for other methods, not the parameters themselves. You can fix this withdbBind(df0, c(date, as.list(vals)))
or see my next note.while it is fine to use
dbSendQuery
,dbBind
, anddbFetch
, you can wrap all of that into one call (just like I showed in my previous answer) usingdbGetQuery(conn, query, params)
. This one call handles sending the query, binding parameters, and clearing the result set automatically.if you prefer to still use
dbSendQuery
, then you really should clean up after it: the return value from this call is a "result set" (often using a variableres
, by convention), which needs to be cleared. Most DBI-based systems will auto-clear a previous result when you send another query, but it's better practice to do so explicitly. It needs to be cleared after you fetch data, so a common way to effect this is tores <- dbSendQuery(database,query)
on.exit(dbClearResult(res), add = TRUE)
dbBind(res, c(date, as.list(vals)))
dbFetch(res)as I've shown in the commented-out code below. The expression in
on.exit
is not evaluated immediately, it is evaluated whenData_Function
is complete and returning to the caller (though the value fromdbFetch(df0)
is still given to the user. I prefer the use ofon.exit
for other reasons, but if you wanted something a little simpler then this also works:res <- dbSendQuery(database,query)
dbBind(res, c(date, as.list(vals)))
out <- dbFetch(res)
dbClearResult(res)
out
Your function, amended/fixed:
Data_Function <- function(database, date, vals){
qmarks <- paste(rep("?", length(vals)), collapse = ",")
query <- paste(
"
SELECT column1,
column2,
dateof
FROM database
WHERE dateof >= ?
AND column1 IN (", qmarks, ")")
# res <- dbSendQuery(database,query)
# on.exit(dbClearResult(res), add = TRUE)
# dbBind(res, c(date, as.list(vals)))
# dbFetch(res)
dbGetQuery(database, query, params = c(date, as.list(vals)))
}
R: dbGetQuery function with a where statement that takes an Excel list (column)
With the assumption that you cannot upload data to do things efficiently, here are two options that are notably less efficient.
If you have a "reasonable" number of possible values, then you can include them in the query directly. Regardless of how to read in the xlsx file, I'll assume that you have a vector
var2candidates
, then usevar2candidates <- c(99999999, 999912, 123) # whatever you get from the xlsx file
qry <- sprintf("select var1, var2, var3 from table1 where var2 in (%s)",
paste(rep("?", length(var2candidates)), collapse=","))
# if the length is 3, then the query would look like this:
# [1] "select var1, var2, var3 from table1 where var2 in (?,?,?)"
myquery = dbGetQuery(theconnection, qry, params = var2candidates)When I say "reasonable", I don't know if
DBI::dbGetQuery
has a limit on bound parameters. If this is on the order of 50 or fewer, then this should work. If much more, then you might get an error with this ... I don't know for certain. (Quick testing on a sql server instance I have suggests that it accepts a length of 2097 but no more; this might apply to you as well. It might be a per-DBMS/instance setting as well. YMMV)(Potentially much-less efficient due to downloading all data.) Query the data without the
var2
constraint and filter locally:myquery = dbGetQuery(
theconnection,
"select var1, var2, var3
from table1
commit;" )
# subset myquery based on what you have in the xlsx file
myquery[ myquery$var2 %in% var2candidates, ]
Pass R variable to RODBC's sqlQuery?
Build the string you intend to pass. So instead of
example <- sqlQuery(myDB,"SELECT * FROM dbo.my_table_fn (x)")
do
example <- sqlQuery(myDB, paste("SELECT * FROM dbo.my_table_fn (",
x, ")", sep=""))
which will fill in the value of x
.
Use RSQLite to manipulate data frame in r directly using SQL
Using the basic sql functions, you could do:
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,connection = "db")
```
```{r}
db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
txt <- "Price coordinates floor.size surburb date\n
'R 1 750 000' '-33.93082074573843, 18.857342125467635' '68 m²' Jhb 2021-06-24\n
'R 1 250 000' '-33.930077157927855, 18.85420954236195' '56 m²' Jhb 2021-06-17\n
'R 2 520 000' '-33.92954929205658, 18.857504799977896' '62 m²' Jhb 2021-06-24"
dataH <- read.table(text = txt, header = TRUE)
DBI::dbWriteTable(db, 'dataH', dataH)
```
```{sql}
SELECT REPLACE(SUBSTRING(price, 3, 100), ' ', '') price,
replace(SUBSTRING(coordinates, 1, 20), ',', '') Lat,
SUBSTRING(coordinates, 21, 255) Long,
SUBSTRING(`floor.size`, 1, 2) floor_size,
surburb,
date
FROM dataH
```
How to read the contents of an .sql file into an R script to run a query?
I've had trouble with reading sql files myself, and have found that often times the syntax gets broken if there are any single line comments in the sql. Since in R you store the sql statement as a single line string, if there are any double dashes in the sql it will essentially comment out any code after the double dash.
This is a function that I typically use whenever I am reading in a .sql file to be used in R.
getSQL <- function(filepath){
con = file(filepath, "r")
sql.string <- ""
while (TRUE){
line <- readLines(con, n = 1)
if ( length(line) == 0 ){
break
}
line <- gsub("\\t", " ", line)
if(grepl("--",line) == TRUE){
line <- paste(sub("--","/*",line),"*/")
}
sql.string <- paste(sql.string, line)
}
close(con)
return(sql.string)
}
Filtering a Pyspark DataFrame with SQL-like IN clause
String you pass to SQLContext
it evaluated in the scope of the SQL environment. It doesn't capture the closure. If you want to pass a variable you'll have to do it explicitly using string formatting:
df = sc.parallelize([(1, "foo"), (2, "x"), (3, "bar")]).toDF(("k", "v"))
df.registerTempTable("df")
sqlContext.sql("SELECT * FROM df WHERE v IN {0}".format(("foo", "bar"))).count()
## 2
Obviously this is not something you would use in a "real" SQL environment due to security considerations but it shouldn't matter here.
In practice DataFrame
DSL is a much better choice when you want to create dynamic queries:
from pyspark.sql.functions import col
df.where(col("v").isin({"foo", "bar"})).count()
## 2
It is easy to build and compose and handles all details of HiveQL / Spark SQL for you.
Python or R -- create a SQL join using a dataframe
Answering my own question here -- after much research.
In short, this cannot be done. A series of criteria, outside of a list or concat, you cannot create a dataframe in python or R and pass it through a query into a SQL Server or Oracle database. It's unfortunate, but if you don't have permissions to write to temporary tables in the Oracle database, you're out of options.
Related Topics
Referencing a Composite Primary Key
Alter Table to Modify Default Value of Column
How to Retrieve The Primary Key When Saving a New Object in Anorm
Bigquery Select * Except Nested Column
Atomically Set Serial Value When Committing Transaction
Check If a Variable Is Null in Plsql
Nosql VS. SQL When Scalability Is Irrelevant
How Can My Application Benefit from Temporary Tables
How to Convert a SQL Subquery to a Join
How to Update All Columns of a Record Without Having to List Every Column
Deleting a Lot of Data in Oracle
Case Statement in Where Clause - SQL Server
Select All Parents or Children in Same Table Relation SQL Server
Sql Server Server Query - Count Distinct Datetime Field
Pass Parameter in Table Valued Function Using Select Statement