Rodbc SQLquery() Returns Varchar(255) When It Should Return Varchar(Max)

RODBC sqlQuery() returns varchar(255) when it should return varchar(MAX)

OK, so it seems that I have found a work-around to this. After some more Google'ing, I found that:

One thing to consider with the SQL Native Client ODBC driver is that VARCHAR(MAX) has does not have fixed size and the ODBC driver
represents this by returning a max column size of 0. This can confuse
your application if it doesn't check for 0 as a special case. See the
bottom section of this article:
http://msdn.microsoft.com/en-us/library/ms130896.aspx But in general I
have not seen this happen with any of my .NET applications as it is
handled properly in ADO.NET.

Source: http://bytes.com/topic/sql-server/answers/808461-cannot-read-varchar-max

So, in my case, the following did the trick:

job.text <- sqlQuery(ccweb5.prod,"
  SELECT DISTINCT TOP 100
    ja.JobTitle,
    [JobText] = CAST(ja.JobText AS varchar(8000)), -- note the data-type re-cast
    [JobTextLength] = LEN(ja.JobText)
  FROM JobStore.dbo.JobAd as ja (NOLOCK)
")

Such that nchar(as.character(job.text[1,2])) now returns 2742 (as it should).

I didn't see any similar questions on StackOverflow so I'll leave this up. Hope this helps somebody!

RODBC::sqlsave() truncates col in DF when saving to SQL Server to varchar(255)

I don't use RODBC, so I cannot test/reproduce your problem, but I'll try to reproduce your situation and show that in my environment it does not fail.

Sample data:

library(tibble)
dat <- tibble(id = 1:2, chr = c(strrep("A", 4000), strrep("B", 400000)))
nchar(dat$chr)
# [1]   4000 400000

library(DBI)
# library(odbc) # no need to load, but need it installed/available
con <- DBI::dbConnect(odbc::odbc(), driver = "ODBC Driver 17 for SQL Server",
                      database = "mydb", server = "111.222.333.444,1433",
                      uid = "myuser", pwd = "mypassword")

(I'm not going to go into all of the options required for this.)

Manually-defined table

DBI::dbExecute(con, "drop table if exists r2test")
# [1] 0
DBI::dbExecute(con, "create table r2test (id int, chr nvarchar(max))")
# [1] 0
system.time(
  DBI::dbWriteTable(con, "r2test", dat, append = TRUE)
)
#    user  system elapsed 
#    0.00    0.02    1.28 
dat2 <- DBI::dbGetQuery(con, "select id, chr from r2test")
nchar(dat2$chr)
# [1]   4000 400000
str(dat2)
# 'data.frame': 2 obs. of  2 variables:
#  $ id : int  1 2
#  $ chr: chr  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"| __truncated__ "BBBBBBB"| __truncated__

Justifying pre-`CreateTable`

I find two possible error scenarios with SQL Server and DBI.

DBI::dbExecute(con, "drop table if exists r2test")
### also with DBI::dbCreateTable(con2, "r2test", dat)
DBI::dbWriteTable(con, "r2test", dat, create = TRUE)
# Error: nanodbc/nanodbc.cpp:1617: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near '4e+05'.  [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared. 
# <SQL> 'CREATE TABLE "r2test" (
#   "id" INT,
#   "chr" varchar(4e+05)
# )
# '

This is because SQL Server appears to not like scientific notation for field sizes. We can avoid this by changing scipen:

options(scipen=99)
DBI::dbWriteTable(con, "r2test", dat, create = TRUE)
# Error: nanodbc/nanodbc.cpp:1617: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The size (400000) given to the column 'chr' exceeds the maximum allowed for any data type (8000).  [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared. 
# <SQL> 'CREATE TABLE "r2test" (
#   "id" INT,
#   "chr" varchar(400000)
# )
# '

And now we see that SQL Server doesn't like explicit sizes that large, so we need to encourage it to use varchar(max).

Pre-created table

DBI::dbExecute(con, "drop table if exists r2test")
DBI::dbCreateTable(con2, "r2test", fields = c(id="INT", chr="nvarchar(max)"))
system.time(
  DBI::dbWriteTable(con, "r2test", dat, append = TRUE)
)
#    user  system elapsed 
#    0.00    0.01    1.34 
dat3 <- DBI::dbGetQuery(con, "select id, chr from r2test")
nchar(dat3$chr)
# [1]   4000 400000
str(dat3)
# 'data.frame': 2 obs. of  2 variables:
#  $ id : int  1 2
#  $ chr: chr  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"| __truncated__ "BBBBBBB"| __truncated__

Note about "large fields"

When using the Microsoft ODBC driver for SQL Server, one must always select "large" fields last in the query. For instance,

DBI::dbGetQuery(con, "select chr, id from r2test")
# Error in result_fetch(res@ptr, n) : 
#   nanodbc/nanodbc.cpp:2966: 07009: [Microsoft][ODBC Driver 17 for SQL Server]Invalid Descriptor Index

This is a known thing with MS's odbc driver for SQL Server (driver "ODBC Driver 17 for SQL Server"). The problem has been around for years. No other driver (including FreeTDS, which connects to SQL Server) is prone to this "feature". There is no indication that this will ever change (in fact, it's "formal" in the documentation, though "large" is not quantified).

I have no idea if RODBC has this issue as well; since it does not use nanodbc, it might work with the SQLGetData function a little more intelligently, side-stepping the problem.

Ways to work around this problem:

always put "large data" at the end of your list of selected columns;
use FreeTDS instead of Microsoft's ODBC drivers ... allegedly it is slightly slower (10%? idk), but I've successfully installed in windows/linux and selected fields in crazy orders without problem;
use RStudio's professional drivers, if you have the right OS and one of RStudio's professional products;
do all queries with "large data" using their bulk tools (bcp or sqlcmd), I believe they both deal better with it, though this is much less interactive than on the R console;
use RODBC (allegedly ... again, I don't know);
don't use "large data" fields (...(max) or anything larger than ...(255) ... a not-well-defined number) ... perhaps not an option;
-- RECENTLY, a PR (odbc!415) has finalized the ability to workaround this large-field problem, so a github-install of the package (until released on CRAN) will work; or
use a different DBMS than SQL Server ... perhaps not an option.

References:

In the odbc package, issue odbc/#10 appears to be the first appearance, and is referenced in all others in this package; related closed issues (#82,
#86,
#112,
#171,
#256,
#331); and some recent discussion perhaps to work-around this problem (#309,
#358,
#373
)
Ultimately, odbc uses the nanodbc C++ library, and while they recognize the problem, they feel it is not theirs to fix (nanodbc/#149).
The source document at Microsoft: https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/getting-long-data

RODBC string getting truncated

The PostgreSQL ODBC driver has a variable called MaxLongVarcharSize that I have found set to 8190 by default (I've used it both on Windows and Ubuntu). It is possible that the MySQL ODBC driver has a similar variable set to 255.

Data from ODBC blob not matching return from SQL query

The exact approach you take may vary depending on your ODBC driver. I'll demonstrate how I do this on MS SQL Server, and hopefully you can adapt it to your needs.

I'm going to use a table in my database called InsertFile with the following definition:

CREATE TABLE [dbo].[InsertFile](
  [OID] [int] IDENTITY(1,1) NOT NULL,
  [filename] [varchar](50) NULL,
  [filedata] [varbinary](max) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

Now let's create a file that we will push into the database.

file <- "hello_world.txt"
write("Hello world", file)

I need to do a little work to prep the byte code for this file to go into SQL. I use this function for that.

prep_file_for_sql <- function(filename){
  bytes <- 
    mapply(FUN = readBin,
           con = filename,
           what = "raw",
           n = file.info(filename)[["size"]],
           SIMPLIFY = FALSE) 
  chars <- 
    lapply(X = bytes,
           FUN = as.character)
  vapply(X = bytes,
         FUN = paste,
         collapse = "",
         FUN.VALUE = character(1))

}

Now, this is a bit strange, but the SQL Server ODBC driver is pretty good at writing VARBINARY columns, but terrible at reading them.

Coincidentally, the SQL Server Native Client 11.0 ODBC driver is terrible at writing VARBINARY columns, but okay-ish with reading them.

So I'm going to have two RODBC objects, conn_write and conn_read.

conn_write <- 
  RODBC::odbcDriverConnect(
    paste0("driver=SQL Server; server=[server_name]; database=[database_name];",
           "uid=[user_name]; pwd=[password]")
  )

conn_read <- 
  RODBC::odbcDriverConnect(
    paste0("driver=SQL Server Native Client 11.0; server=[server_name]; database=[database_name];",
           "uid=[user_name]; pwd=[password]")
  )

Now I'm going to insert the text file into the database using a parameterized query.

sqlExecute(
  channel = conn_write,
  query = "INSERT INTO dbo.InsertFile (filename, filedata) VALUES (?, ?)",
  data = list(file,
              prep_file_for_sql(file)),
  fetch = FALSE
)

And now to read it back out using a parameterized query. The unpleasant trick to use here is recasting your VARBINARY property as a VARBINARY (don't ask me why, but it works).

X <- sqlExecute(
  channel = conn_read,
  query = paste0("SELECT OID, filename, ",
                 "CAST(filedata AS VARBINARY(8000)) AS filedata ",
                 "FROM dbo.InsertFile WHERE filename = ?"),
  data = list("hello_world.txt"),
  fetch = TRUE,
  stringsAsFactors = FALSE
)

Now you can look at the contents with

unlist(X$filedata)

And write the file with

writeBin(unlist(X$filedata),
         con = "hello_world2.txt")

BIG DANGEROUS CAVEAT

You need to be aware of the size of your files. I usually store files as a VARBINARY(MAX), and SQL Server isn't very friendly about exporting those through ODBC (I'm not sure about other SQL Engines; see RODBC sqlQuery() returns varchar(255) when it should return varchar(MAX) for more details)

The only way I've found to get around this is to recast the VARBINARY(MAX) as a VARBINARY(8000). That obviously is a terrible solution if you have more than 8000 bytes in your file. When I need to get around this, I've had to loop over the VARBINARY(MAX) column and created multiple new columns each of length 8000, and then paste them all together in R. (check out: Reconstitute PNG file stored as RAW in SQL Database)

As of yet, I've not come up with a generalized solution to this problem. Perhaps that's something I should spend more time on, though.

ODBC query on MS SQL Server returning first 255 characters only in PHP PDO (FreeTDS)

According to the FreeTDS User Guide, the issue seems to be that FreeTDS can only handle varchar up to 255 characters when talking to SQL Server "due to limitations inherent in the protocol definition". Anything bigger than that needs to be data type text.

You can resolve the issue either by modifying your schema accordingly, or converting the data type during your query, like this:

SELECT CAST(mycol as TEXT) FROM mytable

Fetching Results from Stored Procedure R

As commented and since it may help future readers:

For SQL Server stored procedures, add a SET NOCOUNT ON to avoid the row results output that may interfere with RODBC's sqlQuery which expects table/query data to be returned.

And generally for most stored procedures, move any SELECT statement outside a transaction block that is executing an action query (i.e., UPDATE, INSERT INTO) if needing to return actual data.

Reconstitute PNG file stored as RAW in SQL Database

Well, we've figured out a solution. The raw vector being returned through RODBC did not match what was in the SQL database. Somewhere in the pipeline, the varbinary object from SQL was getting distorted. I'm not sure why or how. But this answer to a different problem inspired us to recast the variables. As soon as we recast them, we could see the correct representation.

The next problem was that all of our images are more than 8000 bytes, and RODBC only allows 8000 characters at a time. So I had to fumble my way around that. The code below does the following:

Determine the largest number of bytes in an image file
Create a set of variables (ImagePart1, ..., ImagePart[n]) breaking the image into as many parts as necessary, each with max length 8000.
Query the database for all of the images.
Combine the image parts into a single object
Write the images to a local file.

The actual code

library(RODBC)

lims <- odbcConnect("DATABASE")

#* 1. Determine the largest number of bytes in the largest image file
ImageLength <- sqlQuery(lims, 
                            paste0("SELECT MaxLength = MAX(LEN(u.Image)) ",
                                   "FROM dbo.[User] u"))

#* Create a query string to make a set of variables breaking
#* the images into as many parts as necessary, each with 
#* max length 8000
n_img_vars <- ImageLength$MaxLength %/% 8000 + 1

start <- 1 + 8000 * (0:(n_img_vars - 1))
end <- 8000 + 8000 * (0:(n_img_vars - 1))

img_parts <- paste0("ImagePart", 1:n_img_vars, 
                    " = CAST(SUBSTRING(u.Image, ", start,
                    ", ", end, ") AS VARBINARY(8000))")

full_query <- paste0("SELECT u.OID, u.LastName, u.FirstName,\n",
                     paste0(img_parts, collapse =",\n"), "\n",
                     "FROM dbo.[User] u \n",
                     "WHERE LEN(u.Image) > 0")

#* 3. Query the database for all the images
Images <- sqlQuery(lims, full_query)

#* 4. Combine the images parts into a single object
Images$full_image <- 
  apply(Images[, grepl("ImagePart", names(Images))], 1, 
        function(x) do.call("c", x))

#* 5. Write the images to a local file
for(i in seq_len(nrow(Images))){
  DIR <- "[FILE_DIR]"
  FILENAME <- with(Images, paste0(OID[i], "-", LastName[i], ".png"))
  writeBin(unlist(Images$full_image[i]),
           file.path(DIR, FILENAME))
}

Rodbc SQLquery() Returns Varchar(255) When It Should Return Varchar(Max)