RODBC sqlQuery() returns varchar(255) when it should return varchar(MAX)
OK, so it seems that I have found a work-around to this. After some more Google'ing, I found that:
One thing to consider with the SQL Native Client ODBC driver is that VARCHAR(MAX) has does not have fixed size and the ODBC driver
represents this by returning a max column size of 0. This can confuse
your application if it doesn't check for 0 as a special case. See the
bottom section of this article:
http://msdn.microsoft.com/en-us/library/ms130896.aspx But in general I
have not seen this happen with any of my .NET applications as it is
handled properly in ADO.NET.
Source: http://bytes.com/topic/sql-server/answers/808461-cannot-read-varchar-max
So, in my case, the following did the trick:
job.text <- sqlQuery(ccweb5.prod,"
SELECT DISTINCT TOP 100
ja.JobTitle,
[JobText] = CAST(ja.JobText AS varchar(8000)), -- note the data-type re-cast
[JobTextLength] = LEN(ja.JobText)
FROM JobStore.dbo.JobAd as ja (NOLOCK)
")
Such that nchar(as.character(job.text[1,2]))
now returns 2742 (as it should).
I didn't see any similar questions on StackOverflow so I'll leave this up. Hope this helps somebody!
RODBC::sqlsave() truncates col in DF when saving to SQL Server to varchar(255)
I don't use RODBC
, so I cannot test/reproduce your problem, but I'll try to reproduce your situation and show that in my environment it does not fail.
Sample data:
library(tibble)
dat <- tibble(id = 1:2, chr = c(strrep("A", 4000), strrep("B", 400000)))
nchar(dat$chr)
# [1] 4000 400000
library(DBI)
# library(odbc) # no need to load, but need it installed/available
con <- DBI::dbConnect(odbc::odbc(), driver = "ODBC Driver 17 for SQL Server",
database = "mydb", server = "111.222.333.444,1433",
uid = "myuser", pwd = "mypassword")
(I'm not going to go into all of the options required for this.)
Manually-defined table
DBI::dbExecute(con, "drop table if exists r2test")
# [1] 0
DBI::dbExecute(con, "create table r2test (id int, chr nvarchar(max))")
# [1] 0
system.time(
DBI::dbWriteTable(con, "r2test", dat, append = TRUE)
)
# user system elapsed
# 0.00 0.02 1.28
dat2 <- DBI::dbGetQuery(con, "select id, chr from r2test")
nchar(dat2$chr)
# [1] 4000 400000
str(dat2)
# 'data.frame': 2 obs. of 2 variables:
# $ id : int 1 2
# $ chr: chr "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"| __truncated__ "BBBBBBB"| __truncated__
Justifying pre-CreateTable
I find two possible error scenarios with SQL Server and DBI
.
DBI::dbExecute(con, "drop table if exists r2test")
### also with DBI::dbCreateTable(con2, "r2test", dat)
DBI::dbWriteTable(con, "r2test", dat, create = TRUE)
# Error: nanodbc/nanodbc.cpp:1617: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near '4e+05'. [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared.
# <SQL> 'CREATE TABLE "r2test" (
# "id" INT,
# "chr" varchar(4e+05)
# )
# '
This is because SQL Server appears to not like scientific notation for field sizes. We can avoid this by changing scipen
:
options(scipen=99)
DBI::dbWriteTable(con, "r2test", dat, create = TRUE)
# Error: nanodbc/nanodbc.cpp:1617: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The size (400000) given to the column 'chr' exceeds the maximum allowed for any data type (8000). [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared.
# <SQL> 'CREATE TABLE "r2test" (
# "id" INT,
# "chr" varchar(400000)
# )
# '
And now we see that SQL Server doesn't like explicit sizes that large, so we need to encourage it to use varchar(max)
.
Pre-created table
DBI::dbExecute(con, "drop table if exists r2test")
DBI::dbCreateTable(con2, "r2test", fields = c(id="INT", chr="nvarchar(max)"))
system.time(
DBI::dbWriteTable(con, "r2test", dat, append = TRUE)
)
# user system elapsed
# 0.00 0.01 1.34
dat3 <- DBI::dbGetQuery(con, "select id, chr from r2test")
nchar(dat3$chr)
# [1] 4000 400000
str(dat3)
# 'data.frame': 2 obs. of 2 variables:
# $ id : int 1 2
# $ chr: chr "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"| __truncated__ "BBBBBBB"| __truncated__
Note about "large fields"
When using the Microsoft ODBC driver for SQL Server, one must always select "large" fields last in the query. For instance,
DBI::dbGetQuery(con, "select chr, id from r2test")
# Error in result_fetch(res@ptr, n) :
# nanodbc/nanodbc.cpp:2966: 07009: [Microsoft][ODBC Driver 17 for SQL Server]Invalid Descriptor Index
This is a known thing with MS's odbc driver for SQL Server (driver "ODBC Driver 17 for SQL Server"). The problem has been around for years. No other driver (including FreeTDS, which connects to SQL Server) is prone to this "feature". There is no indication that this will ever change (in fact, it's "formal" in the documentation, though "large" is not quantified).
I have no idea if RODBC
has this issue as well; since it does not use nanodbc
, it might work with the SQLGetData
function a little more intelligently, side-stepping the problem.
Ways to work around this problem:
- always put "large data" at the end of your list of selected columns;
- use FreeTDS instead of Microsoft's ODBC drivers ... allegedly it is slightly slower (10%? idk), but I've successfully installed in windows/linux and selected fields in crazy orders without problem;
- use RStudio's professional drivers, if you have the right OS and one of RStudio's professional products;
- do all queries with "large data" using their bulk tools (
bcp
orsqlcmd
), I believe they both deal better with it, though this is much less interactive than on the R console; - use
RODBC
(allegedly ... again, I don't know); - don't use "large data" fields (
...(max)
or anything larger than...(255)
... a not-well-defined number) ... perhaps not an option;
-- RECENTLY, a PR (odbc!415) has finalized the ability to workaround this large-field problem, so a github-install of the package (until released on CRAN) will work; or - use a different DBMS than SQL Server ... perhaps not an option.
References:
In the
odbc
package, issue odbc/#10 appears to be the first appearance, and is referenced in all others in this package; related closed issues (#82,
#86,
#112,
#171,
#256,
#331); and some recent discussion perhaps to work-around this problem (#309,
#358,
#373
)Ultimately,
odbc
uses thenanodbc
C++ library, and while they recognize the problem, they feel it is not theirs to fix (nanodbc/#149).The source document at Microsoft: https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/getting-long-data
RODBC string getting truncated
The PostgreSQL ODBC driver has a variable called MaxLongVarcharSize that I have found set to 8190 by default (I've used it both on Windows and Ubuntu). It is possible that the MySQL ODBC driver has a similar variable set to 255.
Data from ODBC blob not matching return from SQL query
The exact approach you take may vary depending on your ODBC driver. I'll demonstrate how I do this on MS SQL Server, and hopefully you can adapt it to your needs.
I'm going to use a table in my database called InsertFile
with the following definition:
CREATE TABLE [dbo].[InsertFile](
[OID] [int] IDENTITY(1,1) NOT NULL,
[filename] [varchar](50) NULL,
[filedata] [varbinary](max) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
Now let's create a file that we will push into the database.
file <- "hello_world.txt"
write("Hello world", file)
I need to do a little work to prep the byte code for this file to go into SQL. I use this function for that.
prep_file_for_sql <- function(filename){
bytes <-
mapply(FUN = readBin,
con = filename,
what = "raw",
n = file.info(filename)[["size"]],
SIMPLIFY = FALSE)
chars <-
lapply(X = bytes,
FUN = as.character)
vapply(X = bytes,
FUN = paste,
collapse = "",
FUN.VALUE = character(1))
}
Now, this is a bit strange, but the SQL Server
ODBC driver is pretty good at writing VARBINARY
columns, but terrible at reading them.
Coincidentally, the SQL Server Native Client 11.0
ODBC driver is terrible at writing VARBINARY
columns, but okay-ish with reading them.
So I'm going to have two RODBC objects, conn_write
and conn_read
.
conn_write <-
RODBC::odbcDriverConnect(
paste0("driver=SQL Server; server=[server_name]; database=[database_name];",
"uid=[user_name]; pwd=[password]")
)
conn_read <-
RODBC::odbcDriverConnect(
paste0("driver=SQL Server Native Client 11.0; server=[server_name]; database=[database_name];",
"uid=[user_name]; pwd=[password]")
)
Now I'm going to insert the text file into the database using a parameterized query.
sqlExecute(
channel = conn_write,
query = "INSERT INTO dbo.InsertFile (filename, filedata) VALUES (?, ?)",
data = list(file,
prep_file_for_sql(file)),
fetch = FALSE
)
And now to read it back out using a parameterized query. The unpleasant trick to use here is recasting your VARBINARY
property as a VARBINARY
(don't ask me why, but it works).
X <- sqlExecute(
channel = conn_read,
query = paste0("SELECT OID, filename, ",
"CAST(filedata AS VARBINARY(8000)) AS filedata ",
"FROM dbo.InsertFile WHERE filename = ?"),
data = list("hello_world.txt"),
fetch = TRUE,
stringsAsFactors = FALSE
)
Now you can look at the contents with
unlist(X$filedata)
And write the file with
writeBin(unlist(X$filedata),
con = "hello_world2.txt")
BIG DANGEROUS CAVEAT
You need to be aware of the size of your files. I usually store files as a VARBINARY(MAX)
, and SQL Server isn't very friendly about exporting those through ODBC (I'm not sure about other SQL Engines; see RODBC sqlQuery() returns varchar(255) when it should return varchar(MAX) for more details)
The only way I've found to get around this is to recast the VARBINARY(MAX)
as a VARBINARY(8000)
. That obviously is a terrible solution if you have more than 8000 bytes in your file. When I need to get around this, I've had to loop over the VARBINARY(MAX)
column and created multiple new columns each of length 8000, and then paste them all together in R. (check out: Reconstitute PNG file stored as RAW in SQL Database)
As of yet, I've not come up with a generalized solution to this problem. Perhaps that's something I should spend more time on, though.
ODBC query on MS SQL Server returning first 255 characters only in PHP PDO (FreeTDS)
According to the FreeTDS User Guide, the issue seems to be that FreeTDS can only handle varchar
up to 255 characters when talking to SQL Server "due to limitations inherent in the protocol definition". Anything bigger than that needs to be data type text
.
You can resolve the issue either by modifying your schema accordingly, or converting the data type during your query, like this:
SELECT CAST(mycol as TEXT) FROM mytable
Fetching Results from Stored Procedure R
As commented and since it may help future readers:
For SQL Server stored procedures, add a SET NOCOUNT ON
to avoid the row results output that may interfere with RODBC's sqlQuery
which expects table/query data to be returned.
And generally for most stored procedures, move any SELECT
statement outside a transaction block that is executing an action query (i.e., UPDATE
, INSERT INTO
) if needing to return actual data.
Reconstitute PNG file stored as RAW in SQL Database
Well, we've figured out a solution. The raw vector being returned through RODBC did not match what was in the SQL database. Somewhere in the pipeline, the varbinary
object from SQL was getting distorted. I'm not sure why or how. But this answer to a different problem inspired us to recast the variables. As soon as we recast them, we could see the correct representation.
The next problem was that all of our images are more than 8000 bytes, and RODBC only allows 8000 characters at a time. So I had to fumble my way around that. The code below does the following:
- Determine the largest number of bytes in an image file
- Create a set of variables (
ImagePart1
, ...,ImagePart[n]
) breaking the image into as many parts as necessary, each with max length 8000. - Query the database for all of the images.
- Combine the image parts into a single object
- Write the images to a local file.
The actual code
library(RODBC)
lims <- odbcConnect("DATABASE")
#* 1. Determine the largest number of bytes in the largest image file
ImageLength <- sqlQuery(lims,
paste0("SELECT MaxLength = MAX(LEN(u.Image)) ",
"FROM dbo.[User] u"))
#* Create a query string to make a set of variables breaking
#* the images into as many parts as necessary, each with
#* max length 8000
n_img_vars <- ImageLength$MaxLength %/% 8000 + 1
start <- 1 + 8000 * (0:(n_img_vars - 1))
end <- 8000 + 8000 * (0:(n_img_vars - 1))
img_parts <- paste0("ImagePart", 1:n_img_vars,
" = CAST(SUBSTRING(u.Image, ", start,
", ", end, ") AS VARBINARY(8000))")
full_query <- paste0("SELECT u.OID, u.LastName, u.FirstName,\n",
paste0(img_parts, collapse =",\n"), "\n",
"FROM dbo.[User] u \n",
"WHERE LEN(u.Image) > 0")
#* 3. Query the database for all the images
Images <- sqlQuery(lims, full_query)
#* 4. Combine the images parts into a single object
Images$full_image <-
apply(Images[, grepl("ImagePart", names(Images))], 1,
function(x) do.call("c", x))
#* 5. Write the images to a local file
for(i in seq_len(nrow(Images))){
DIR <- "[FILE_DIR]"
FILENAME <- with(Images, paste0(OID[i], "-", LastName[i], ".png"))
writeBin(unlist(Images$full_image[i]),
file.path(DIR, FILENAME))
}
Related Topics
Invoking a Large Set of SQL from a Rails 4 Application
How Does Select Top Works When No Order by Is Specified
Google Big Query SQL - Get Most Recent Column Value
Select Distinct Is Slower Than Expected on My Table in Postgresql
How to Rewrite This SQL into Codeigniter's Active Records
SQL to Determine Minimum Sequential Days of Access
Postgresql Extract Last Row for Each Id
SQL Distinct for 2 Fields in a Database
Is Inner Join the Same as Equi-Join
Rails 3 Activerecord Query Using Both SQL in and SQL or Operators
SQL Server, Can't Insert Null into Primary Key Field
Sp_Msforeachdb: Only Include Results from Databases with Results
SQL Server: Two-Level Group by with Xml Output
How to Unfold the Results of an Oracle Query Based on the Value of a Column
How to Check for Is Not Null and Is Not Empty String in SQL Server