How to Check Column Structure in Ssis

how to check column structure in ssis?

Solution

Based on your comment, you are handling with flat files. To solve this problem, you have to read all columns as one column and retrieve the structure on the go.

Details

  1. First add a Flat file connection manager.
  2. In the flat file connection manager, go to the Advanced Tab, remove all columns and keep only one column (Column0).
  3. Change the column type to DT_WSTR and the length to 4000.
  4. Add a Dataflow task
  5. Inside the Dataflow task add a Flat File source, a script component and an OLEDB destination.
  6. Open the script component, go to Input/Output Tab and and add 8 output columns (Distributer_Code,Cust_code,cust_name,cust_add,zip,tel,dl_number,gstin)
  7. Change the script language to Visual Basic.
  8. Inside the script write the following code.

    Dim Distributer_Code as integer = -1
    Dim Cust_code as integer = -1
    Dim cust_name as integer = -1
    Dim cust_add as integer = -1
    Dim zip as integer = -1
    Dim tel as integer = -1
    Dim dl_number as integer = -1
    Dim gstin as integer = -1
    Dim intRowIndex as integer = 0

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

    If intRowIndex = 0 then

    Dim strfields() as string = Row.Column0.split(CChar("|"))
    Dim idx as integer = 0

    For idx = 0 To strFields.length - 1

    Select case str

    Case "Distributer_Code"
    Distributer_Code = idx
    Case "Cust_code"
    Cust_code = idx
    Case "cust_name"
    cust_name = idx
    Case "cust_add"
    cust_add = idx
    Case "zip"
    zip = idx
    Case "tel"
    tel = idx
    Case "dl_number"
    dl_number = idx
    Case "gstin"
    gstin = idx

    End Select

    Next

    Else

    Dim strfields() as string = Row.Column0.split(CChar("|"))

    If Distributer_Code > -1 Then Row.DistributerCode = strfields(Distributer_Code)
    If Cust_code > -1 Then Row.Custcode = strfields(Cust_code)
    If cust_name > -1 Then Row.custname = strfields(cust_name)
    If cust_add > -1 Then Row.custadd = strfields(cust_add)
    If zip > -1 Then Row.zip = strfields(zip)
    If tel > -1 Then Row.tel = strfields(tel)
    If dl_number > -1 Then Row.dlnumber = strfields(dl_number)
    If gstin > -1 Then Row.gstin = strfields(gstin)

    End If

    intRowIndex += 1

    End Sub
  9. Map the output columns to the OLEDB Destination

Get list of columns of source flat file in SSIS

I agree with the answer provided by @TabAlleman. SSIS can't natively handle dynamic columns (and niether can your SQL destination).

May I propose an alternative? You can detect a change in headers without using a C# Script Tasks. One way to do this would be to create a flafile connection that reads the entire row as a single column. Use a Conditional Split to discard anything other than the header row. Save that row to a RecordSet object. Any change? Send Email.

The "Get Header Row" DataFlow would look like this. Row Number if needed.

Sample Image

The Control Flow level would look like this. Use a ForEach ADO RecordSet object to assign the header row value to an SSIS variable CurrentHeader..

Sample Image

Above, the precedent constraints (fx icons ) of

[@ExpectedHeader] == [@CurrentHeader]
[@ExpectedHeader] != [@CurrentHeader]

determine whether you load data or send email.

Hope this helps!

SSIS reading flatfile header column

In the flat file connection manager, uncheck First row contains header option. Then go to Advanced Tab, delete all column and leave one and change its length to 4000.

In the data flow task, add a script component that split each row and:

  1. Read the columns headers from the first row
  2. Generate the desired output columns in all remaining rows

The following answers (different situations but they are helpful) will give you some insights:

  • SSIS ragged file not recognized CRLF
  • how to check column structure in ssis?
  • SSIS reading LF as terminator when its set as CRLF

How does SSIS determine table structure when importing a table from external source?

I suggest to create script transformation that will handle dynamic source changes and correct DDL (try to pass it like variable to "Preparation SQL Task")

Identify the Column changed in SCD Type 2 in SSIS SQL server

Honestly I don't really know why you need this functionality as you can very easily just look at the two rows to see any changes, on the off chance that you do actually need to see them. I've never needed a ColumnUpdated type value and I don't think the processing required to generate one and the storage to hold the data is worth having it.


That said, here is one way you can calculate the desired output from your given test data. Ideally you would do this in a more efficient way as part of your ETL process that is updating the rows as they come in rather than all at once. Though this obviously required info about your ETL that you haven't included in your question:

Query

declare @SCDtest table(id int,empid int,Deptid varchar(10),Ename varchar(50),DeptName varchar(50),city varchar(50),startdate datetime,Enddate datetime);
Insert into @SCDtest values(1, 1, 'D1', 'Mike', 'Account', 'Atlanta', '7/31/2020', '8/3/2020'),(2, 2, 'D2', 'Roy', 'IT', 'New York', '7/31/2020', '8/5/2020'),(3, 1, 'D1', 'Ross', 'Account', 'Atlanta', '8/4/2020', '8/7/2020'),(4, 2, 'D2', 'Roy', 'IT', 'Los angeles', '8/5/2020',NULL),(5, 1, 'D1', 'John', 'Marketing', 'Boston', '8/8/2020', NULL);

with l as
(
select *
,lag(id,1) over (partition by empid order by id) as l
from @SCDtest
)
select l.id
,l.empid
,l.Deptid
,l.Ename
,l.DeptName
,l.city
,l.startdate
,l.Enddate
,stuff(concat(case when l.Deptid <> t.Deptid then ', Deptid' end
,case when l.Ename <> t.Ename then ', Ename' end
,case when l.DeptName <> t.DeptName then ', DeptName' end
,case when l.city <> t.city then ', city' end
)
,1,2,''
) as ColumnUpdated
from l
left join @SCDtest as t
on l.l = t.id
order by l.empid
,l.startdate;

Output

+----+-------+--------+-------+-----------+-------------+-------------------------+-------------------------+-----------------------+
| id | empid | Deptid | Ename | DeptName | city | startdate | Enddate | ColumnUpdated |
+----+-------+--------+-------+-----------+-------------+-------------------------+-------------------------+-----------------------+
| 1 | 1 | D1 | Mike | Account | Atlanta | 2020-07-31 00:00:00.000 | 2020-08-03 00:00:00.000 | NULL |
| 3 | 1 | D1 | Ross | Account | Atlanta | 2020-08-04 00:00:00.000 | 2020-08-07 00:00:00.000 | Ename |
| 5 | 1 | D1 | John | Marketing | Boston | 2020-08-08 00:00:00.000 | NULL | Ename, DeptName, city |
| 2 | 2 | D2 | Roy | IT | New York | 2020-07-31 00:00:00.000 | 2020-08-05 00:00:00.000 | NULL |
| 4 | 2 | D2 | Roy | IT | Los angeles | 2020-08-05 00:00:00.000 | NULL | city |
+----+-------+--------+-------+-----------+-------------+-------------------------+-------------------------+-----------------------+

SSIS Flat File data column compare against table's column data range

At the core, you will need to use a Lookup Component. Write a query, SELECT T.id, T.start, T.end FROM dbo.MyTable AS T and use that as your source. Map the input column to the start column and select the id so that it will be added to the data flow.

If you hit run, it will perform an exact lookup and only find values of 110000 and 119200. To convert it to a range query, you will need to go into the Advanced tab. There should be 3 things you can check: amount of memory, rows and customize the query. When you check the last one, you should get a query like

SELECT * FROM 
(SELECT T.id, T.start, T.end FROM dbo.MyTable AS T`) AS ref
WHERE ref.start = ?

You will need to modify that to become

SELECT * FROM 
(SELECT T.id, T.start, T.end FROM dbo.MyTable AS T`) AS ref
WHERE ? BETWEEN ref.start AND ref.end

It's been my experience that the range queries can become rather inefficient as it seems to cache what's been seen already so if the source file had 110001, 110002, 110003 you would see 3 unique queries sent to the database. For small data sets, that may not be so bad but it led to some ugly load times for my DW.

An alternative to this is to explode the ranges. For me, I had a source system that only kept date ranges and I needed to know by day what certain counts were. The range lookups were not performing well so I crafted a query to convert the single row with a range of 2010-01-01 to 2013-07-07 to many rows, each with a single date 2013-01-01, 2013-01-02... While this approach lead to a longer pre-execute phase (it took a few minutes as the query had to generate ~30k rows per day for the past 5 years), once cached locally it was a simple seek to find a given transaction by day.

Preferably, I'd create a numbers table, fill it to the max of int and be done with it but you might get by with just using an inline table valued function to generate numbers. Your query would then look something like

SELECT
T.id
, GN.number
FROM
dbo.MyTable AS T
INNER JOIN
-- Make this big enough to satisfy your theoretical ranges
dbo.GenerateNumbers(1000000) AS GN
ON GN.number BETWEEN T.start and T.end;

That would get used in a "straight" lookup without the need for any of the advanced features. The lookup is going to get very memory hungry though so make the query as tight as possible. For example, cast the GN.number from a bigint to an int in the source query if you know your values will fit in an int.



Related Topics



Leave a reply



Submit