SQL Bulk Insert with FIRSTROW parameter skips the following line
I don't think you can skip rows in a different format with BULK INSERT
/BCP
.
When I run this:
TRUNCATE TABLE so1029384
BULK INSERT so1029384
FROM 'C:\Data\test\so1029384.txt'
WITH
(
--FIRSTROW = 2,
FIELDTERMINATOR= '|',
ROWTERMINATOR = '\n'
)
SELECT * FROM so1029384
I get:
col1 col2 col3
-------------------------------------------------- -------------------------------------------------- --------------------------------------------------
***A NICE HEADER HERE***
0000001234 SSNV 00013893-03JUN09
0000005678 ABCD 00013893-03JUN09
0000009112 0000 00013893-03JUN09
0000009112 0000 00013893-03JUN09
It looks like it requires the '|' even in the header data, because it reads up to that into the first column - swallowing up a newline into the first column. Obviously if you include a field terminator parameter, it expects that every row MUST have one.
You could strip the row with a pre-processing step. Another possibility is to select only complete rows, then process them (exluding the header). Or use a tool which can handle this, like SSIS.
T SQL Bulk Insert skipping first row with or without header
I think you need to look at the formatting of your data again. Maybe some invisible white space snug in somewhere?
IF OBJECT_ID('tempdb.dbo.#tempTable', 'U') IS NOT NULL
DROP TABLE #tempTable;
CREATE TABLE #tempTable
(
StartDate datetime,
EndDate datetime,
TransactionItemsMigrated bigint,
TransactionSizeBytes bigint,
CurrentItemsFailed bigint
)
BULK INSERT #tempTable
FROM 'D:\User\Documents\test.csv' --change to CSV file location
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
TABLOCK
)
SELECT * FROM #tempTable
This code works with this sample csv file
StartDate,EndDate,TransactionItemsMigrated,TransactionSizeBytes,CurrentItemsFailed
2016-03-17,2016-03-17,0,1,2
2016-03-18,2016-03-17,1,1,2
2016-03-19,2016-03-17,2,1,2
2016-03-20,2016-03-17,3,1,2
With this output
StartDate EndDate TransactionItemsMigrated TransactionSizeBytes CurrentItemsFailed
2016-03-17 00:00:00.000 2016-03-17 00:00:00.000 0 1 2
2016-03-18 00:00:00.000 2016-03-17 00:00:00.000 1 1 2
2016-03-19 00:00:00.000 2016-03-17 00:00:00.000 2 1 2
2016-03-20 00:00:00.000 2016-03-17 00:00:00.000 3 1 2
SQL Bulk insert ignores first data row
"UTF-8 with unix style row terminator" I assume you're using a version of SQL Server that doesn't support UTF-8. From BULK INSERT (Transact-SQL)
** Important ** Versions prior to SQL Server 2016 (13.x) do not support code page 65001 (UTF-8 encoding).
If you are using 2016+, then specify the code page for UTF-8:
BULK INSERT #mycsv
FROM 'D:\TestData\Test.dat'
WITH (ROWTERMINATOR = '0x0A',
FIELDTERMINATOR = '|',
FIRSTROW = 1,
CODEPAGE = '65001',
ERRORFILE = 'D:\TestData\Import.log');
If you aren't using SQL Server 2016+, then you cannot use BULK INSERT
to import a UTF-8 file; you will have to use a different code page or use a different tool.
Note, also, that the above document states the below:
The FIRSTROW attribute is not intended to skip column headers. Skipping headers is not supported by the BULK INSERT statement. When skipping rows, the SQL Server Database Engine looks only at the field terminators, and does not validate the data in the fields of skipped rows.
if you are skipping rows, you still need to ensure the row is valid, but it's not for skipping headers. This means you should be using FIRSTROW = 1
and fixing your header row as @sarlacii points out.
Of course, that does not fix the code page problem if you are using an older version of SQL Server; and my point stands that you'll have to use a different technology on 2014 and prior.
SQL Server BULK INSERT: why The FIRSTROW attribute is not intended to skip column headers?
the SQL Server Database Engine looks only at the field terminators
That's your answer. If your column headers have names that include field terminators, the system will find the wrong number of fields in the first line, and great hilarity will thus ensue. You can imagine that in all the world's files, some column had
Years, since 1970
as a header. Clear enough to the human, but the machine has only rules.
In case you're not aware, your bulk insert
will fail in the general case for CSV files. CSV format is quite variable, and more complex than what bulk insert
can interpret correctly. Specifically, there are quoting rules that allow commas to be included among the data. SQL Server won't honor the quotes. It will interpret the comma as a separator regardless. Like the man says,
the SQL Server Database Engine looks only at the field terminators
first data row skipped in import
There are a couple of problems here:
I suspect there isn't a valid
\n
on the first line. Otherwise SQL Server wouldn't munge the first two rows when you change toFIRSTROW = 1
.Using
","
as the column delimiter works great for all of the columns except the first and the last column. This leaves a leading"
on the first column, and a trailing"
on the last column. You can deal with the latter by changing yourROWTERMINATOR
to"\n
, but that will only work if you can also add a trailing"
to the header row (during the process of ensuring that there is a valid\n
there). At that point you may as well make sure that the header row matches the data rows in all aspects, so:"COLUMN1","COLUMN2"
-------------------^ this character has to be \n
In all honesty, I think you could spend a week fighting with all of these nitty-gritty BCP
and BULK INSERT
issues, and still not have a perfect solution that doesn't require post-op actions (such as trimming leading/trailing "
characters from certain columns). My recommendation: spend 20 minutes and write a parser in C# that will automatically correct these files - removing the header row, ensuring the right delimiters are in place, removing all the stupid "
, etc. before SQL Server ever sees the file. Cleaning the file up will be a lot less hassle than the hoops you're jumping through now. I'm sure there are solutions to this but IIRC you've been wrestling with it for quite some time...
Bulk Insert how to skip column using format file in SQL Server 2017
Can you try this format file: I think the new Format=CSV option does some thinking that the Format File no longer needs to do...
14.0
2
1 SQLCHAR 0 1 "," 2 Id SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 50 "\r\n" 3 Name SQL_Latin1_General_CP1_CI_AS
Related Topics
One-To-Many Query Selecting All Parents and Single Top Child for Each Parent
Oracle: How to Get Percent of Total by a Query
Group Data by the Change of Grouping Column Value in Order
Table in Excel from SQL Server Stored Procedure with Parameter Field in Workbook
How to Get Week Start and End Date String in Postgresql
How to Insert a Unique Id into Each SQLite Row
How to Create an Index in Amazon Redshift
Conditional Join Different Tables
Extra Fields with SQL Min() & Group By
Mysql: How to Sum() a Timediff() on a Group
Sequentially Number Rows by Keyed Group in SQL
Sqlite: Preventing Duplicate Rows
Need a Tool to Automatically Indent and Format SQL Server Stored Procedures
Postgresql Recursive Self Join
How to Speed Up Row_Number in Oracle
Custom Post Type Yearly/ Monthly Archive
Modify Table: How to Change 'Allow Nulls' Attribute from Not Null to Allow Null