Ssis Task for Inconsistent Column Count Import

SSIS Task for inconsistent column count import?

Off the top of my head, I have a 50% solution for you.

The problem

SSIS really cares about meta data so variations in it tend to result in exceptions. DTS was far more forgiving in this sense. That strong need for consistent meta data makes use of the Flat File Source troublesome.

Query based solution

If the problem is the component, let's not use it. What I like about this approach is that conceptually, it's the same as querying a table-the order of columns does not matter nor does the presence of extra columns matter.

Variables

I created 3 variables, all of type string: CurrentFileName, InputFolder and Query.

  • InputFolder is hard wired to the source folder. In my example, it's C:\ssisdata\Kipreal
  • CurrentFileName is the name of a file. During design time, it was input5columns.csv but that will change at run time.
  • Query is an expression "SELECT col1, col2, col3, col4, col5 FROM " + @[User::CurrentFilename]

variables window

Connection manager

Set up a connection to the input file using the JET OLEDB driver. After creating it as described in the linked article, I renamed it to FileOLEDB and set an expression on the ConnectionManager of "Data Source=" + @[User::InputFolder] + ";Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties=\"text;HDR=Yes;FMT=CSVDelimited;\";"

Control Flow

My Control Flow looks like a Data flow task nested in a Foreach file enumerator

control flow

Foreach File Enumerator

My Foreach File enumerator is configured to operate on files. I put an expression on the Directory for @[User::InputFolder] Notice that at this point, if the value of that folder needs to change, it'll correctly be updated in both the Connection Manager and the file enumerator. In "Retrieve file name", instead of the default "Fully Qualified", choose "Name and Extension"

Foreach File Enumerator - Collection tab

In the Variable Mappings tab, assign the value to our @[User::CurrentFileName] variable

Foreach File Enumerator - Variable Mappings tab

At this point, each iteration of the loop will change the value of the @[User::Query to reflect the current file name.

Data Flow

This is actually the easiest piece. Use an OLE DB source and wire it as indicated.

Data flow

Use the FileOLEDB connection manager and change the Data Access mode to "SQL Command from variable." Use the @[User::Query] variable in there, click OK and you're ready to work.
oledb file source

Sample data

I created two sample files input5columns.csv and input7columns.csv All of the columns of 5 are in 7 but 7 has them in a different order (col2 is ordinal position 2 and 6). I negated all the values in 7 to make it readily apparent which file is being operated on.

col1,col3,col2,col5,col4
1,3,2,5,4
1111,3333,2222,5555,4444
11,33,22,55,44
111,333,222,555,444

and

col1,col3,col7,col5,col4,col6,col2
-1111,-3333,-7777,-5555,-4444,-6666,-2222
-111,-333,-777,-555,-444,-666,-222
-1,-3,-7,-5,-4,-6,-2
-11,-33,-77,-55,-44,-666,-222

Running the package results in these two screen shots

5 column file 7 column file

What's missing

I don't know of a way to tell the query based approach that it's OK if a column doesn't exist. If there's a unique key, I suppose you could define your query to have only the columns that must be there and then perform lookups against the file to try and obtain the columns that ought to be there and not fail the lookup if the column doesn't exist. Pretty kludgey though.

SSIS Bulk Import for files with different column names

The simplest way to insert the contents of a file into the SQL Server database is to use the Bulk Insert process

The following is an example from a popular article

BULK INSERT EmployeeDB.dbo.Employees
FROM 'C:\Data\EmployeeData_c.dat'
WITH
(
DATAFILETYPE = 'char',
FIELDTERMINATOR = '|',
ROWTERMINATOR = '\r\n'
);

Since your file formats are different, one of the simplest approach is the following:

(1) For each of your file format, have a table. You can call it a staging table.

(2) You do a bulk insert of your file contents to this staging table

(3) After doing a bulk insert into this staging table, you can insert into your main table

You can achieve the above, by using command line tools like SQLCMD. You can implement via SSIS also. Generally, you would go to SSIS if you want more complex processing for your files. In you case, by preference would be do use the command line tools itself.

SSIS pkg with flat-file connection with fewer columns will fail

It reads like you have two problems here. The first is understanding how to use Connection Managers. For flat file inputs, you are generally going to be better served by creating a connection manager per file layout. File 1 looks like (Column1) and File 2 looks like (Column1, Column2)? That means 2 different Flat File Connection managers need to be defined.

If you have 2 version of File 2, one where Column1 has numbers and another with Column1 containing character data, those would require 2 unique connection managers (3 in total).

The good news relative to the above is that file names changes are trivial and do not require a unique Connection Manager to be created. F1.txt, F1_20120501.txt, F1.good.txt, etc would all be served by the Connection Manager you have defined for that layout. You would simply need to use an expression on the ConnectionString property of a given Connection Manager to update the current package at run-time.

So now that you have all these Flat File Connection Managers, you need to use them. That magic happens in the Data Flow Tasks. A dataflow is real persnickety about the metadata used in it. When you are designing a data flow, you are making a contract with SSIS and if you try to violate it by making a character field into a date field or not providing all the columns, the package will fail validation checks as you aren't holding up your end of the bargain. The resolution to this is that you're again going to need to define multiple data flows around the various Connection Managers your packages need.

With all that defined, you would simply need a coordinator to look at source files to determine which data flow should be executed. I provided an example on this question Create SSIS package to import from one of many data sources

There was also a similar question where I proposed a solution that may be of interest SSIS Task for inconsistent column count import? It really depends what your rules are for processing.

If you are trying to consolidate/reuse business logic in your SSIS packages, then I would look an approach of using the various dataflows to stage the discrete sources into a singular data storage thing (raw file, staging table with lots of null columns, etc).

Handling Inconsistent Delimiters in Flat File Source on ForeachLoop Container

I managed to handle it with a script task, thanks!

Basically added a script task to the Foreach Loop Container that executes before my DataFlow task.

I send the file name through as a variable:
Script Task Variable

I added the following namespaces to the script:

using System.IO;
using RuntimeWrapper = Microsoft.SqlServer.Dts.Runtime.Wrapper;

And my script looks like this:

public void Main()
{
if (!string.IsNullOrEmpty(Dts.Variables["sFileName"].Value.ToString()))
{
StreamReader file = new StreamReader(Dts.Variables["sFileName"].Value.ToString());

if (file != null)
{
string HeadRowDelimiter = "";
string ColDelimiter = "";
string data = "";

while (file.Peek() >= -1)
{
char[] c = new char[500];
file.Read(c, 0, c.Length);

data = string.Join("", c);

if (!string.IsNullOrEmpty(data))
{
//set row delimiters
if (data.Contains("\r\n"))
{
HeadRowDelimiter = "\r\n";
}
else if (data.Contains("\r"))
{
HeadRowDelimiter = "\r";
}
else if (data.Contains("\n"))
{
HeadRowDelimiter = "\n";
}
else if (data.Contains("\0"))
{
HeadRowDelimiter = "\0";
}

//set column delimiters
if (data.Contains("\t"))
{
ColDelimiter = "\t";
}
else if (data.Contains(";"))
{
ColDelimiter = ";";
}
else if (data.Contains(","))
{
ColDelimiter = ",";
}
else if (data.Contains(":"))
{
ColDelimiter = ":";
}
else if (data.Contains("|"))
{
ColDelimiter = "|";
}
else if (data.Contains("\0"))
{
ColDelimiter = "\0";
}
}

break;
}
file.Close();

RuntimeWrapper.IDTSConnectionManagerFlatFile100 flatFileConnection = Dts.Connections["FlatFileConnection"].InnerObject as RuntimeWrapper.IDTSConnectionManagerFlatFile100;

if (flatFileConnection != null)
{

flatFileConnection.HeaderRowDelimiter = HeadRowDelimiter;
flatFileConnection.RowDelimiter = HeadRowDelimiter;
flatFileConnection.HeaderRowsToSkip = 0;
flatFileConnection.Columns[0].ColumnDelimiter = ColDelimiter;
}

Dts.TaskResult = (int)ScriptResults.Success;
}
}
}

SSIS: importing files some with column names, some without

Yes there is!

In the "Error Output" page in the editor, change the Error response for each row to "Redirect row". Then you can trap the problem rows (the headers, in your case) by taking them as a single column through the error output of your source.



Related Topics



Leave a reply



Submit