Parameterize SQL Queries using R RODBC
You can create a input variable at the start and pass it to your query.
For example:
# Change your FY here
input_FY <- 2016
dbhandle <- odbcDriverConnect('driver={SQL Server};server=SQLSERVER;database=MYDATABASE;trusted_connection=true')
degrees <- sqlQuery(dbhandle, paste0("
select Inst, ID, DegreeDate, Degree
from DEGREE
where FY = ('", input_FY, "')
group by Inst, ID, DegreeDate, Degree
order by Inst, ID, DegreeDate, Degree"),
as.is=TRUE)
So for any complicated queries you can still pass the same input_FY
variable or any other variable that you have declared at the start of code for a quick/easy update.
Parameterized SQL query in R with IN clause
To do this with string manipulation as in the question:
x <- "2000-01-01"
y <- "2001-01-01"
Item_Code <- c('A1','A2','B1','B2')
query <- sprintf("select * from Item_History
where Item_Exp_Date between '%s' and '%s'
and Item_Code in (%s)", x, y, toString(shQuote(Item_Code, 'sh')))
We could alternately use fn$
from the gsubfn package for string interpolation:
library(gsubfn)
query2 <- fn$identity("select * from Item_History
where Item_Exp_Date between '$x' and '$y'
and Item_Code in ( `toString(shQuote(Item_Code, 'sh'))` )")
Using rodbc package to query sql server from R. Need to pass in date/time value into sqlQuery as part of where statement
I recommend the RODBCext
package for this:
library(RODBCext)
sqlQuery(cn,
"select * from [blah].[blah] where metric = ? and date_time >= ? and date_time <= ?",
data = list("name",
btime,
etime),
fetch = TRUE,
stringsAsFactors = FALSE)
RODBCext
makes it really easy to use a parameterized query where you can place a ?
in the query code where you want to put a parameter. Since your btime
and etime
are already in the appropriate format for SQL, you can pass them as strings. I also made 'name'
a parameter in the query to avoid using quotes in the query (see https://cran.r-project.org/web/packages/RODBCext/vignettes/Parameterized_SQL_queries.html for details).
sqlExecute
is a really convenient way to get away from working strings into your queries.
RODBC Cursor Queries
RODBC queries supports only one SQL statement per sqlQuery
call. Your attempted procedural query consists of multiple statements. Additionally, you are running dynamic SQL with loops, two things R can do for you without temp tables.
Consider retrieving all database names then iterate across each for your aggregate query. Notice the use of period qualifiers to query from other databases on same server. Finally, concatenate all to a final, single data frame object.
conn <- odbcConnect(dsn = "Database", uid = "***", pwd = "***")
db_names <- sqlQuery(conn, "SELECT [name] FROM sysdatabases")
df_list <- lapply(db_names$name, function(db)
sqlQuery(conn, paste0("select
cast(startdatetime as date) as start_date,
count(*) as [count],
'bookings' as [indicator]
from [", db, "]..bookings
group by cast(startdatetime as date)")
)
)
final_df <- do.call(rbind.data.frame, df_list)
odbcClose(conn)
using rodbcext with character columns
I have had good luck using parameterized queries, dataframes, and some string formatting. Here's the function I use, made generic and with explicit namespaces for function clarity:
library(RODBCext) # Also loads RODBC as dependency
Connection <- RODBC::odbcConnect('DSN') # I am assuming you can connect via RODBC
BulkUpload <- function(new_data) {
# Get the column names for the new data to add
columns <- colnames(new_data)
# Get the valid column names from the SQL target table
sql_columns <- RODBC::sqlColumns(Connection, "target_table")$COLUMN_NAME
# Check to make sure all the columns in the dataframe are valid target table columns
if(sum(columns %in% sql_columns) != length(columns)){
stop("Cannot complete upload-- One or more columns doesn't exist in target table")
}
# Generate the query dynamically based upon the column names and number of columns
column_query <- paste(columns, collapse = ",")
values_query <- paste(rep("?", length(columns)), collapse = ",")
NewDataQuery <- sprintf("INSERT INTO target_table (%s) VALUES (%s)", column_query, values_query)
# Make the parameterized query call, given there is no information to add
ifelse(nrow(new_data) == 0, stop("No new data to add"),
RODBCext::sqlExecute(Connection, NewDataQuery, new_data))
}
This is nice because it will only insert data into the columns you have specified in your dataframe column names. Keep in mind, you'll need to make sure your dataframe includes columns for any data fields that are required in your database as not null and don't have default values.
creating a looped SQL QUERY using RODBC in R
A couple of syntax issues arise from current setup:
LOOP: You do not iterate through all rows of data frame but only the atomic ID values in the single column,
df$ID
. In that same loop you are passing the entire vectors ofdf$Start_Date
anddf$End_Date
into query concatenation.DATES: Your date formats do not align to most data base date formats of 'YYYY-MM-DD'. And still some others like Oracle, you require string to data conversion:
TO_DATE(mydate, 'YYYY-MM-DD')
.
A couple of aforementioned performance / best practices issues:
PARAMETERIZATION: While parameterization is not needed for security reasons since your values are not generated by user input who can inject malicious SQL code, for maintainability and readability, parameterized queries are advised. Hence, consider doing so.
GROWING OBJECTS: According to Patrick Burn's Inferno Circle 2: Growing Objects, R programmers should avoid growing multi-dimensional objects like data frames inside a loop which can cause excessive copying in memory. Instead, build a list of data frames to
rbind
once outside the loop.
With that said, you can avoid any looping or listing needs by saving your data frame as a database table then joined to final table for a filtered, join query import. This assumes your database user has CREATE TABLE
and DROP TABLE
privileges.
# CONVERT DATE FIELDS TO DATE TYPE
df <- within(df, {
Start_Date = as.Date(Start_Date, format="%m/%d/%Y")
End_Date = as.Date(End_Date, format="%m/%d/%Y")
})
# SAVE DATA FRAME TO DATABASE
sqlSave(dbconnection, df, "myRData", rownames = FALSE, append = FALSE)
# IMPORT JOINED AND DATE FILTERED QUERY
q <- "SELECT ID, Date, Account_Balance
FROM Table t
INNER JOIN myRData r
ON r.ID = t.ID
AND t.Date BETWEEN r.Start_Date AND r.End_Date"
final_df <- sqlQuery(dbconnection, q)
Related Topics
How to Simulate Unpivot in Access
Select Statement to Find Duplicates on Certain Fields
Required to Join 2 Tables With Their Fks in a 3Rd Table
Fast Way to Discover the Row Count of a Table in Postgresql
How to Force Postgres to Use a Particular Index
How to Select from Subquery Using Laravel Query Builder
How to Find Gaps in Sequential Numbering in MySQL
Possible to Perform Cross-Database Queries With Postgresql
SQL - How to Store and Navigate Hierarchies
How to List the Primary Key of a SQL Server Table
Convert Varchar into Datetime in SQL Server
Performing SQL Queries on an Excel Table Within a Workbook With Vba Macro
Generate a Resultset of Incrementing Dates in Tsql
Can a Foreign Key Be Null And/Or Duplicate
Copy Tables from One Database to Another in SQL Server
Insert Data in 3 Tables At a Time Using Postgres
Querying Data by Joining Two Tables in Two Database on Different Servers
Physical Vs. Logical (Hard Vs. Soft) Delete of Database Record