Special Characters Displaying Incorrectly After Bulk Insert

Special characters displaying incorrectly after BULK INSERT

You need to BULK INSERT using the CODEPAGE = 'ACP', which converts string data from Windows codepage 1252 to SQL Server codepage.

BULK INSERT dbo.temp FROM 'C:\Temp\file.csv' 
WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '\n', CODEPAGE = 'ACP');

If you are bringing in UTF-8 data on a new enough version of SQL Server:

[...] , CODEPAGE = '65001');

You may also need to specify DATAFILETYPE = 'char|native|widechar|widenative'.

Accented characters not correctly imported with BULK INSERT

It was answered in the comment. Did you try it?

http://msdn.microsoft.com/en-us/library/ms189941.aspx

Option DATAFILETYPE ='widenative'

Based on comment from Esailiga did the text get truncated before or after the bulk import. I agree it sounds like the CSV file itself is single byte. Unicode requires option DATAFILETYPE ='widenative'. If the CSV file is single byte the is not magic translation back.

What is too bad is é is extended ASCII and supported with SQL char so more evidence the problem is at the CSV.

SELECT CAST('é' AS char(1))

notice this works as extended ASCII (<255)

Sounds like you need to go back to the source.

The ? in SQL is unknown. Same as � in notepad.

How to write UTF-8 characters using bulk insert in SQL Server?

You can't. You should first use a N type data field, convert your file to UTF-16 and then import it. The database does not support UTF-8.

Bulk insert csv - column value itself has commas

If I had this particular issue and the creation of the CSV files was not under my control, I would resort to a Perl script like this:

open(my $fhin, "<", "MyFile.csv");
open(my $fhout, ">", "MyQFile.csv");

while (my $line = <$fh>) {
  chomp($line);
  $line =~ s/^([^,]*),([^,]*),(.*),([^,]*)$/\"$1\",\"$2\",\"$3\",\"$4\"/;

  print $fhout $line . "\n";
}

Note that the above regular expression can handle only one "problem" column of this kind. If there are any others, there is no possibility of programmatically assigning correct quotation to such columns (without more information...).

BULK INSERT command is silently exiting without any error/exception after processing 315 batches in MSSQL

The issue is related to the database cursor object.
The root cause behind this is database cursor object is getting full and I had to do pagination like traversal to the cursor object.
Because cursor was I was not able to capture the latest output of my stored procedure. In my case my cursor is getting full after 315 (return statements from stored procedure. Example output of insert statement like 10 rows inserted).

Sample python code

import pyodbc

conn = pyodbc.connect(username, password)

query = "a stored procedure"

# Execute query and get cursor object
cursor = conn.execute(query)

# Clearing cursor
while cursor.nextset():
    result  = cursor.fetchall() if cursor.description else None

How to properly convert accented characters using FOR XML

I don't know for sure, but I'd assume, that this is not possible without a hack.

SQL-Server treats XML as 2 byte utf-16 internally. So all characters, which are displayable within this range do not need escaping.

It is a different thing, when it comes to unprintable characters or letters, which have a declarative meaning, like <, > and & or quotes (in some cases).

I'd probably use a cast to NVARCHAR(MAX) and then a replace on string level at the point where you export this data. You won't be able to cast this back to XML:

--Push your test XML into a variable

DECLARE @xml XML=
(
    SELECT 'é' AS Accent
    FOR XML PATH('')
);
SELECT @xml; --<Accent>é</Accent>

--Cast it to string and do the replacement on string-level

SELECT REPLACE(CAST(@xml AS NVARCHAR(MAX)),'é', 'é') --<Accent>é</Accent>

--Trying to cast this result back to XML fails

SELECT CAST(REPLACE(CAST(@xml AS NVARCHAR(MAX)),'é', 'é') AS XML);

XML parsing: line 1, character 16, well formed check: undeclared entity

Obvioulsy SQL Server's XML engine doesn't even know this entity...

Special Characters Displaying Incorrectly After Bulk Insert