Passing a Dataframe List to a Where Clause in a SQL Query Embedded in R

Passing a DataFrame list to a WHERE clause in a SQL query embedded in R

Suppose we wish to insert the Time column from the built in data frame BOD into an sql query.

sprintf("select * from X where Y in (%s)", toString(BOD$Time))
## [1] "select * from X where Y in (1, 2, 3, 4, 5, 7)"

Passing DataFrame column into WHERE clause in SQL query embedded in R via parametrized queries

This is very close to your previous question, more formalized now in a function. There are a few things to note:

  • dbBind takes three arguments, but all of the parameters to be bound must be in position 2; position 3 (the ellipses) is any argument(s) for other methods, not the parameters themselves. You can fix this with dbBind(df0, c(date, as.list(vals))) or see my next note.

  • while it is fine to use dbSendQuery, dbBind, and dbFetch, you can wrap all of that into one call (just like I showed in my previous answer) using dbGetQuery(conn, query, params). This one call handles sending the query, binding parameters, and clearing the result set automatically.

  • if you prefer to still use dbSendQuery, then you really should clean up after it: the return value from this call is a "result set" (often using a variable res, by convention), which needs to be cleared. Most DBI-based systems will auto-clear a previous result when you send another query, but it's better practice to do so explicitly. It needs to be cleared after you fetch data, so a common way to effect this is to

    res <- dbSendQuery(database,query)
    on.exit(dbClearResult(res), add = TRUE)
    dbBind(res, c(date, as.list(vals)))
    dbFetch(res)

    as I've shown in the commented-out code below. The expression in on.exit is not evaluated immediately, it is evaluated when Data_Function is complete and returning to the caller (though the value from dbFetch(df0) is still given to the user. I prefer the use of on.exit for other reasons, but if you wanted something a little simpler then this also works:

    res <- dbSendQuery(database,query)
    dbBind(res, c(date, as.list(vals)))
    out <- dbFetch(res)
    dbClearResult(res)
    out

Your function, amended/fixed:

Data_Function <- function(database, date, vals){
qmarks <- paste(rep("?", length(vals)), collapse = ",")
query <- paste(
"
SELECT column1,
column2,
dateof
FROM database
WHERE dateof >= ?
AND column1 IN (", qmarks, ")")
# res <- dbSendQuery(database,query)
# on.exit(dbClearResult(res), add = TRUE)
# dbBind(res, c(date, as.list(vals)))
# dbFetch(res)
dbGetQuery(database, query, params = c(date, as.list(vals)))
}

R: dbGetQuery function with a where statement that takes an Excel list (column)

With the assumption that you cannot upload data to do things efficiently, here are two options that are notably less efficient.

  1. If you have a "reasonable" number of possible values, then you can include them in the query directly. Regardless of how to read in the xlsx file, I'll assume that you have a vector var2candidates, then use

    var2candidates <- c(99999999, 999912, 123) # whatever you get from the xlsx file
    qry <- sprintf("select var1, var2, var3 from table1 where var2 in (%s)",
    paste(rep("?", length(var2candidates)), collapse=","))
    # if the length is 3, then the query would look like this:
    # [1] "select var1, var2, var3 from table1 where var2 in (?,?,?)"
    myquery = dbGetQuery(theconnection, qry, params = var2candidates)

    When I say "reasonable", I don't know if DBI::dbGetQuery has a limit on bound parameters. If this is on the order of 50 or fewer, then this should work. If much more, then you might get an error with this ... I don't know for certain. (Quick testing on a sql server instance I have suggests that it accepts a length of 2097 but no more; this might apply to you as well. It might be a per-DBMS/instance setting as well. YMMV)

  2. (Potentially much-less efficient due to downloading all data.) Query the data without the var2 constraint and filter locally:

    myquery = dbGetQuery(
    theconnection,
    "select var1, var2, var3
    from table1
    commit;" )
    # subset myquery based on what you have in the xlsx file
    myquery[ myquery$var2 %in% var2candidates, ]

Pass R variable to RODBC's sqlQuery?

Build the string you intend to pass. So instead of

example <- sqlQuery(myDB,"SELECT * FROM dbo.my_table_fn (x)")

do

example <- sqlQuery(myDB, paste("SELECT * FROM dbo.my_table_fn (", 
x, ")", sep=""))

which will fill in the value of x.

Use RSQLite to manipulate data frame in r directly using SQL

Using the basic sql functions, you could do:

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,connection = "db")
```

```{r}
db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

txt <- "Price coordinates floor.size surburb date\n
'R 1 750 000' '-33.93082074573843, 18.857342125467635' '68 m²' Jhb 2021-06-24\n
'R 1 250 000' '-33.930077157927855, 18.85420954236195' '56 m²' Jhb 2021-06-17\n
'R 2 520 000' '-33.92954929205658, 18.857504799977896' '62 m²' Jhb 2021-06-24"

dataH <- read.table(text = txt, header = TRUE)
DBI::dbWriteTable(db, 'dataH', dataH)
```

```{sql}
SELECT REPLACE(SUBSTRING(price, 3, 100), ' ', '') price,
replace(SUBSTRING(coordinates, 1, 20), ',', '') Lat,
SUBSTRING(coordinates, 21, 255) Long,
SUBSTRING(`floor.size`, 1, 2) floor_size,
surburb,
date
FROM dataH
```

How to read the contents of an .sql file into an R script to run a query?

I've had trouble with reading sql files myself, and have found that often times the syntax gets broken if there are any single line comments in the sql. Since in R you store the sql statement as a single line string, if there are any double dashes in the sql it will essentially comment out any code after the double dash.

This is a function that I typically use whenever I am reading in a .sql file to be used in R.

getSQL <- function(filepath){
con = file(filepath, "r")
sql.string <- ""

while (TRUE){
line <- readLines(con, n = 1)

if ( length(line) == 0 ){
break
}

line <- gsub("\\t", " ", line)

if(grepl("--",line) == TRUE){
line <- paste(sub("--","/*",line),"*/")
}

sql.string <- paste(sql.string, line)
}

close(con)
return(sql.string)
}

Filtering a Pyspark DataFrame with SQL-like IN clause

String you pass to SQLContext it evaluated in the scope of the SQL environment. It doesn't capture the closure. If you want to pass a variable you'll have to do it explicitly using string formatting:

df = sc.parallelize([(1, "foo"), (2, "x"), (3, "bar")]).toDF(("k", "v"))
df.registerTempTable("df")
sqlContext.sql("SELECT * FROM df WHERE v IN {0}".format(("foo", "bar"))).count()
## 2

Obviously this is not something you would use in a "real" SQL environment due to security considerations but it shouldn't matter here.

In practice DataFrame DSL is a much better choice when you want to create dynamic queries:

from pyspark.sql.functions import col

df.where(col("v").isin({"foo", "bar"})).count()
## 2

It is easy to build and compose and handles all details of HiveQL / Spark SQL for you.

Python or R -- create a SQL join using a dataframe

Answering my own question here -- after much research.
In short, this cannot be done. A series of criteria, outside of a list or concat, you cannot create a dataframe in python or R and pass it through a query into a SQL Server or Oracle database. It's unfortunate, but if you don't have permissions to write to temporary tables in the Oracle database, you're out of options.



Related Topics



Leave a reply



Submit