SQL Bulk Insert with Firstrow Parameter Skips the Following Line

SQL Bulk Insert with FIRSTROW parameter skips the following line

I don't think you can skip rows in a different format with BULK INSERT/BCP.

When I run this:

TRUNCATE TABLE so1029384

BULK INSERT so1029384
FROM 'C:\Data\test\so1029384.txt'
WITH
(
--FIRSTROW = 2,
FIELDTERMINATOR= '|',
ROWTERMINATOR = '\n'
)

SELECT * FROM so1029384

I get:

col1                                               col2                                               col3
-------------------------------------------------- -------------------------------------------------- --------------------------------------------------
***A NICE HEADER HERE***
0000001234 SSNV 00013893-03JUN09
0000005678 ABCD 00013893-03JUN09
0000009112 0000 00013893-03JUN09
0000009112 0000 00013893-03JUN09

It looks like it requires the '|' even in the header data, because it reads up to that into the first column - swallowing up a newline into the first column. Obviously if you include a field terminator parameter, it expects that every row MUST have one.

You could strip the row with a pre-processing step. Another possibility is to select only complete rows, then process them (exluding the header). Or use a tool which can handle this, like SSIS.

T SQL Bulk Insert skipping first row with or without header

I think you need to look at the formatting of your data again. Maybe some invisible white space snug in somewhere?

IF OBJECT_ID('tempdb.dbo.#tempTable', 'U') IS NOT NULL
DROP TABLE #tempTable;

CREATE TABLE #tempTable
(
StartDate datetime,
EndDate datetime,
TransactionItemsMigrated bigint,
TransactionSizeBytes bigint,
CurrentItemsFailed bigint
)

BULK INSERT #tempTable
FROM 'D:\User\Documents\test.csv' --change to CSV file location
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
TABLOCK
)

SELECT * FROM #tempTable

This code works with this sample csv file

StartDate,EndDate,TransactionItemsMigrated,TransactionSizeBytes,CurrentItemsFailed
2016-03-17,2016-03-17,0,1,2
2016-03-18,2016-03-17,1,1,2
2016-03-19,2016-03-17,2,1,2
2016-03-20,2016-03-17,3,1,2

With this output

StartDate   EndDate TransactionItemsMigrated    TransactionSizeBytes    CurrentItemsFailed
2016-03-17 00:00:00.000 2016-03-17 00:00:00.000 0 1 2
2016-03-18 00:00:00.000 2016-03-17 00:00:00.000 1 1 2
2016-03-19 00:00:00.000 2016-03-17 00:00:00.000 2 1 2
2016-03-20 00:00:00.000 2016-03-17 00:00:00.000 3 1 2

SQL Bulk insert ignores first data row

"UTF-8 with unix style row terminator" I assume you're using a version of SQL Server that doesn't support UTF-8. From BULK INSERT (Transact-SQL)

** Important ** Versions prior to SQL Server 2016 (13.x) do not support code page 65001 (UTF-8 encoding).

If you are using 2016+, then specify the code page for UTF-8:

BULK INSERT #mycsv
FROM 'D:\TestData\Test.dat'
WITH (ROWTERMINATOR = '0x0A',
FIELDTERMINATOR = '|',
FIRSTROW = 1,
CODEPAGE = '65001',
ERRORFILE = 'D:\TestData\Import.log');

If you aren't using SQL Server 2016+, then you cannot use BULK INSERT to import a UTF-8 file; you will have to use a different code page or use a different tool.


Note, also, that the above document states the below:

The FIRSTROW attribute is not intended to skip column headers. Skipping headers is not supported by the BULK INSERT statement. When skipping rows, the SQL Server Database Engine looks only at the field terminators, and does not validate the data in the fields of skipped rows.

if you are skipping rows, you still need to ensure the row is valid, but it's not for skipping headers. This means you should be using FIRSTROW = 1 and fixing your header row as @sarlacii points out.

Of course, that does not fix the code page problem if you are using an older version of SQL Server; and my point stands that you'll have to use a different technology on 2014 and prior.

SQL Server BULK INSERT: why The FIRSTROW attribute is not intended to skip column headers?

the SQL Server Database Engine looks only at the field terminators

That's your answer. If your column headers have names that include field terminators, the system will find the wrong number of fields in the first line, and great hilarity will thus ensue. You can imagine that in all the world's files, some column had

Years, since 1970

as a header. Clear enough to the human, but the machine has only rules.

In case you're not aware, your bulk insert will fail in the general case for CSV files. CSV format is quite variable, and more complex than what bulk insert can interpret correctly. Specifically, there are quoting rules that allow commas to be included among the data. SQL Server won't honor the quotes. It will interpret the comma as a separator regardless. Like the man says,

the SQL Server Database Engine looks only at the field terminators

first data row skipped in import

There are a couple of problems here:

  1. I suspect there isn't a valid \n on the first line. Otherwise SQL Server wouldn't munge the first two rows when you change to FIRSTROW = 1.

  2. Using "," as the column delimiter works great for all of the columns except the first and the last column. This leaves a leading " on the first column, and a trailing " on the last column. You can deal with the latter by changing your ROWTERMINATOR to "\n, but that will only work if you can also add a trailing " to the header row (during the process of ensuring that there is a valid \n there). At that point you may as well make sure that the header row matches the data rows in all aspects, so:

    "COLUMN1","COLUMN2"
    -------------------^ this character has to be \n

In all honesty, I think you could spend a week fighting with all of these nitty-gritty BCP and BULK INSERT issues, and still not have a perfect solution that doesn't require post-op actions (such as trimming leading/trailing " characters from certain columns). My recommendation: spend 20 minutes and write a parser in C# that will automatically correct these files - removing the header row, ensuring the right delimiters are in place, removing all the stupid ", etc. before SQL Server ever sees the file. Cleaning the file up will be a lot less hassle than the hoops you're jumping through now. I'm sure there are solutions to this but IIRC you've been wrestling with it for quite some time...

Bulk Insert how to skip column using format file in SQL Server 2017

Can you try this format file: I think the new Format=CSV option does some thinking that the Format File no longer needs to do...

14.0
2
1 SQLCHAR 0 1 "," 2 Id SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 50 "\r\n" 3 Name SQL_Latin1_General_CP1_CI_AS


Related Topics



Leave a reply



Submit