Rodbc and Microsoft SQL Server: Truncating Long Character Strings

RODBC string getting truncated

The PostgreSQL ODBC driver has a variable called MaxLongVarcharSize that I have found set to 8190 by default (I've used it both on Windows and Ubuntu). It is possible that the MySQL ODBC driver has a similar variable set to 255.

Truncation of database name and columns while connecting to SQL database from R

I had a look at this and turns out this was exactly what I was facing. Followed the steps mentioned here to resolve the issue.

RODBC sqlQuery() returns varchar(255) when it should return varchar(MAX)

OK, so it seems that I have found a work-around to this. After some more Google'ing, I found that:

One thing to consider with the SQL Native Client ODBC driver is that VARCHAR(MAX) has does not have fixed size and the ODBC driver
represents this by returning a max column size of 0. This can confuse
your application if it doesn't check for 0 as a special case. See the
bottom section of this article:
http://msdn.microsoft.com/en-us/library/ms130896.aspx But in general I
have not seen this happen with any of my .NET applications as it is
handled properly in ADO.NET.

Source: http://bytes.com/topic/sql-server/answers/808461-cannot-read-varchar-max

So, in my case, the following did the trick:

job.text <- sqlQuery(ccweb5.prod,"
  SELECT DISTINCT TOP 100
    ja.JobTitle,
    [JobText] = CAST(ja.JobText AS varchar(8000)), -- note the data-type re-cast
    [JobTextLength] = LEN(ja.JobText)
  FROM JobStore.dbo.JobAd as ja (NOLOCK)
")

Such that nchar(as.character(job.text[1,2])) now returns 2742 (as it should).

I didn't see any similar questions on StackOverflow so I'll leave this up. Hope this helps somebody!

RODBC::sqlsave() truncates col in DF when saving to SQL Server to varchar(255)

I don't use RODBC, so I cannot test/reproduce your problem, but I'll try to reproduce your situation and show that in my environment it does not fail.

Sample data:

library(tibble)
dat <- tibble(id = 1:2, chr = c(strrep("A", 4000), strrep("B", 400000)))
nchar(dat$chr)
# [1]   4000 400000

library(DBI)
# library(odbc) # no need to load, but need it installed/available
con <- DBI::dbConnect(odbc::odbc(), driver = "ODBC Driver 17 for SQL Server",
                      database = "mydb", server = "111.222.333.444,1433",
                      uid = "myuser", pwd = "mypassword")

(I'm not going to go into all of the options required for this.)

Manually-defined table

DBI::dbExecute(con, "drop table if exists r2test")
# [1] 0
DBI::dbExecute(con, "create table r2test (id int, chr nvarchar(max))")
# [1] 0
system.time(
  DBI::dbWriteTable(con, "r2test", dat, append = TRUE)
)
#    user  system elapsed 
#    0.00    0.02    1.28 
dat2 <- DBI::dbGetQuery(con, "select id, chr from r2test")
nchar(dat2$chr)
# [1]   4000 400000
str(dat2)
# 'data.frame': 2 obs. of  2 variables:
#  $ id : int  1 2
#  $ chr: chr  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"| __truncated__ "BBBBBBB"| __truncated__

Justifying pre-`CreateTable`

I find two possible error scenarios with SQL Server and DBI.

DBI::dbExecute(con, "drop table if exists r2test")
### also with DBI::dbCreateTable(con2, "r2test", dat)
DBI::dbWriteTable(con, "r2test", dat, create = TRUE)
# Error: nanodbc/nanodbc.cpp:1617: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near '4e+05'.  [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared. 
# <SQL> 'CREATE TABLE "r2test" (
#   "id" INT,
#   "chr" varchar(4e+05)
# )
# '

This is because SQL Server appears to not like scientific notation for field sizes. We can avoid this by changing scipen:

options(scipen=99)
DBI::dbWriteTable(con, "r2test", dat, create = TRUE)
# Error: nanodbc/nanodbc.cpp:1617: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The size (400000) given to the column 'chr' exceeds the maximum allowed for any data type (8000).  [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared. 
# <SQL> 'CREATE TABLE "r2test" (
#   "id" INT,
#   "chr" varchar(400000)
# )
# '

And now we see that SQL Server doesn't like explicit sizes that large, so we need to encourage it to use varchar(max).

Pre-created table

DBI::dbExecute(con, "drop table if exists r2test")
DBI::dbCreateTable(con2, "r2test", fields = c(id="INT", chr="nvarchar(max)"))
system.time(
  DBI::dbWriteTable(con, "r2test", dat, append = TRUE)
)
#    user  system elapsed 
#    0.00    0.01    1.34 
dat3 <- DBI::dbGetQuery(con, "select id, chr from r2test")
nchar(dat3$chr)
# [1]   4000 400000
str(dat3)
# 'data.frame': 2 obs. of  2 variables:
#  $ id : int  1 2
#  $ chr: chr  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"| __truncated__ "BBBBBBB"| __truncated__

Note about "large fields"

When using the Microsoft ODBC driver for SQL Server, one must always select "large" fields last in the query. For instance,

DBI::dbGetQuery(con, "select chr, id from r2test")
# Error in result_fetch(res@ptr, n) : 
#   nanodbc/nanodbc.cpp:2966: 07009: [Microsoft][ODBC Driver 17 for SQL Server]Invalid Descriptor Index

This is a known thing with MS's odbc driver for SQL Server (driver "ODBC Driver 17 for SQL Server"). The problem has been around for years. No other driver (including FreeTDS, which connects to SQL Server) is prone to this "feature". There is no indication that this will ever change (in fact, it's "formal" in the documentation, though "large" is not quantified).

I have no idea if RODBC has this issue as well; since it does not use nanodbc, it might work with the SQLGetData function a little more intelligently, side-stepping the problem.

Ways to work around this problem:

always put "large data" at the end of your list of selected columns;
use FreeTDS instead of Microsoft's ODBC drivers ... allegedly it is slightly slower (10%? idk), but I've successfully installed in windows/linux and selected fields in crazy orders without problem;
use RStudio's professional drivers, if you have the right OS and one of RStudio's professional products;
do all queries with "large data" using their bulk tools (bcp or sqlcmd), I believe they both deal better with it, though this is much less interactive than on the R console;
use RODBC (allegedly ... again, I don't know);
don't use "large data" fields (...(max) or anything larger than ...(255) ... a not-well-defined number) ... perhaps not an option;
-- RECENTLY, a PR (odbc!415) has finalized the ability to workaround this large-field problem, so a github-install of the package (until released on CRAN) will work; or
use a different DBMS than SQL Server ... perhaps not an option.

References:

In the odbc package, issue odbc/#10 appears to be the first appearance, and is referenced in all others in this package; related closed issues (#82,
#86,
#112,
#171,
#256,
#331); and some recent discussion perhaps to work-around this problem (#309,
#358,
#373
)
Ultimately, odbc uses the nanodbc C++ library, and while they recognize the problem, they feel it is not theirs to fix (nanodbc/#149).
The source document at Microsoft: https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/getting-long-data

SQL import via RODBC - wrong values after implicit? typecast

Use as.is = TRUE.

testid <- sqlQuery(database,"SELECT CAST(id as CHAR) as id from my_table", as.is=TRUE);

Even if the column id numeric in the database, testid will be a data frame containing character inputs.
I think that as.is can be set for each column separately (using as.is = c(..)) or for all at the same time.

Probably the CAST(.. as CHAR) is not necessary when the column already is of type VARCHAR.

RODBC - Query vector longer than 1

Thanks @r2evans for pointing out the solution:

query <- c("select 1 as a", "select 2 as b")

dbhandle <- odbcDriverConnect('driver=SQL Server;server=SOME_SERVER;database=csn_pricing;trusted_connection=true')
df <- sqlQuery(dbhandle, query, stringsAsFactors = FALSE)
odbcClose(dbhandle)

This results in

> df
  a
1 1

Hence, only the first element is used.

Rodbc and Microsoft SQL Server: Truncating Long Character Strings

RODBC string getting truncated

Truncation of database name and columns while connecting to SQL database from R

RODBC sqlQuery() returns varchar(255) when it should return varchar(MAX)

RODBC::sqlsave() truncates col in DF when saving to SQL Server to varchar(255)

Manually-defined table

Justifying pre-`CreateTable`

Pre-created table

Note about "large fields"

SQL import via RODBC - wrong values after implicit? typecast

RODBC - Query vector longer than 1

Related Topics

Leave a reply

RODBC string getting truncated

Truncation of database name and columns while connecting to SQL database from R

RODBC sqlQuery() returns varchar(255) when it should return varchar(MAX)

RODBC::sqlsave() truncates col in DF when saving to SQL Server to varchar(255)

Manually-defined table

Justifying pre-CreateTable

Pre-created table

Note about "large fields"

SQL import via RODBC - wrong values after implicit? typecast

RODBC - Query vector longer than 1

Related Topics

Leave a reply

Justifying pre-`CreateTable`