Parameterized Queries with Rodbc

Parameterize SQL Queries using R RODBC

You can create a input variable at the start and pass it to your query.
For example:

# Change your FY here
input_FY <- 2016

dbhandle <- odbcDriverConnect('driver={SQL Server};server=SQLSERVER;database=MYDATABASE;trusted_connection=true')

degrees <- sqlQuery(dbhandle, paste0("
select Inst, ID, DegreeDate, Degree
from DEGREE
where FY = ('", input_FY, "')
group by Inst, ID, DegreeDate, Degree
order by Inst, ID, DegreeDate, Degree"),
as.is=TRUE)

So for any complicated queries you can still pass the same input_FY variable or any other variable that you have declared at the start of code for a quick/easy update.

Parameterized SQL query in R with IN clause

To do this with string manipulation as in the question:

x <- "2000-01-01"
y <- "2001-01-01"
Item_Code <- c('A1','A2','B1','B2')

query <- sprintf("select * from Item_History
where Item_Exp_Date between '%s' and '%s'
and Item_Code in (%s)", x, y, toString(shQuote(Item_Code, 'sh')))

We could alternately use fn$ from the gsubfn package for string interpolation:

library(gsubfn)
query2 <- fn$identity("select * from Item_History
where Item_Exp_Date between '$x' and '$y'
and Item_Code in ( `toString(shQuote(Item_Code, 'sh'))` )")

Using rodbc package to query sql server from R. Need to pass in date/time value into sqlQuery as part of where statement

I recommend the RODBCext package for this:

library(RODBCext)
sqlQuery(cn,
"select * from [blah].[blah] where metric = ? and date_time >= ? and date_time <= ?",
data = list("name",
btime,
etime),
fetch = TRUE,
stringsAsFactors = FALSE)

RODBCext makes it really easy to use a parameterized query where you can place a ? in the query code where you want to put a parameter. Since your btime and etime are already in the appropriate format for SQL, you can pass them as strings. I also made 'name' a parameter in the query to avoid using quotes in the query (see https://cran.r-project.org/web/packages/RODBCext/vignettes/Parameterized_SQL_queries.html for details).

sqlExecute is a really convenient way to get away from working strings into your queries.

RODBC Cursor Queries

RODBC queries supports only one SQL statement per sqlQuery call. Your attempted procedural query consists of multiple statements. Additionally, you are running dynamic SQL with loops, two things R can do for you without temp tables.

Consider retrieving all database names then iterate across each for your aggregate query. Notice the use of period qualifiers to query from other databases on same server. Finally, concatenate all to a final, single data frame object.

conn <- odbcConnect(dsn = "Database", uid = "***", pwd = "***")

db_names <- sqlQuery(conn, "SELECT [name] FROM sysdatabases")

df_list <- lapply(db_names$name, function(db)
sqlQuery(conn, paste0("select
cast(startdatetime as date) as start_date,
count(*) as [count],
'bookings' as [indicator]
from [", db, "]..bookings
group by cast(startdatetime as date)")
)
)

final_df <- do.call(rbind.data.frame, df_list)

odbcClose(conn)

using rodbcext with character columns

I have had good luck using parameterized queries, dataframes, and some string formatting. Here's the function I use, made generic and with explicit namespaces for function clarity:

library(RODBCext)  # Also loads RODBC as dependency
Connection <- RODBC::odbcConnect('DSN') # I am assuming you can connect via RODBC

BulkUpload <- function(new_data) {

# Get the column names for the new data to add
columns <- colnames(new_data)

# Get the valid column names from the SQL target table
sql_columns <- RODBC::sqlColumns(Connection, "target_table")$COLUMN_NAME

# Check to make sure all the columns in the dataframe are valid target table columns
if(sum(columns %in% sql_columns) != length(columns)){
stop("Cannot complete upload-- One or more columns doesn't exist in target table")
}

# Generate the query dynamically based upon the column names and number of columns
column_query <- paste(columns, collapse = ",")
values_query <- paste(rep("?", length(columns)), collapse = ",")
NewDataQuery <- sprintf("INSERT INTO target_table (%s) VALUES (%s)", column_query, values_query)

# Make the parameterized query call, given there is no information to add
ifelse(nrow(new_data) == 0, stop("No new data to add"),
RODBCext::sqlExecute(Connection, NewDataQuery, new_data))
}

This is nice because it will only insert data into the columns you have specified in your dataframe column names. Keep in mind, you'll need to make sure your dataframe includes columns for any data fields that are required in your database as not null and don't have default values.

creating a looped SQL QUERY using RODBC in R

A couple of syntax issues arise from current setup:

  1. LOOP: You do not iterate through all rows of data frame but only the atomic ID values in the single column, df$ID. In that same loop you are passing the entire vectors of df$Start_Date and df$End_Date into query concatenation.

  2. DATES: Your date formats do not align to most data base date formats of 'YYYY-MM-DD'. And still some others like Oracle, you require string to data conversion: TO_DATE(mydate, 'YYYY-MM-DD').

A couple of aforementioned performance / best practices issues:


  1. PARAMETERIZATION: While parameterization is not needed for security reasons since your values are not generated by user input who can inject malicious SQL code, for maintainability and readability, parameterized queries are advised. Hence, consider doing so.

  2. GROWING OBJECTS: According to Patrick Burn's Inferno Circle 2: Growing Objects, R programmers should avoid growing multi-dimensional objects like data frames inside a loop which can cause excessive copying in memory. Instead, build a list of data frames to rbind once outside the loop.


With that said, you can avoid any looping or listing needs by saving your data frame as a database table then joined to final table for a filtered, join query import. This assumes your database user has CREATE TABLE and DROP TABLE privileges.

# CONVERT DATE FIELDS TO DATE TYPE
df <- within(df, {
Start_Date = as.Date(Start_Date, format="%m/%d/%Y")
End_Date = as.Date(End_Date, format="%m/%d/%Y")
})

# SAVE DATA FRAME TO DATABASE
sqlSave(dbconnection, df, "myRData", rownames = FALSE, append = FALSE)

# IMPORT JOINED AND DATE FILTERED QUERY
q <- "SELECT ID, Date, Account_Balance
FROM Table t
INNER JOIN myRData r
ON r.ID = t.ID
AND t.Date BETWEEN r.Start_Date AND r.End_Date"

final_df <- sqlQuery(dbconnection, q)


Related Topics



Leave a reply



Submit