Importing and Validating Xml File Using Ssis or Just Plain T-Sql

Importing and validating XML file using SSIS or just plain T-SQL?

Based on the requirements that you have mentioned, I would say that you can use best of both the worlds (T-SQL & SSIS).

I feel that T-SQL gives more flexibility in loading the XML data that you have described in the question.

There are lot of different ways you can achieve this. Here is one possible option:

  1. Create a Stored Procedure that would take the path of the XML file as input parameter.

  2. Perform your XML data load operation using the T-SQL way which you feel is easier.

  3. Use SSIS package to perform error handling, file processing, archiving and send email.

  4. Use logging feature available in SSIS. It just requires simple configuration. Here is a samples that show how to configure logging in SSIS How to track status of rows successfully processed or failed in SSIS data flow task?

  5. A sample mock up of your flow would be as shown below in the screenshot. Loop the files using Foreach Loop container. Pass the file path as parameter to Execute SQL Task, which in turn would call the T-SQL that you had mentioned. After processing the file, using the File System Task to move the file to an archive folder.

  6. Sample used in SSIS reading multiple xml files from folder
    shows how to loop through files using Foreach loop container. It loops through xml files but uses Data Flow Task because the xml files are in simpler format.

  7. Sample used in How to send the records from a table in an e-mail body using SSIS package? shows how to send e-mail using Send Mail Task.

  8. Sample used in How do I move files to an archive folder after the files have been processed? shows how to move files to an Archive folder.

  9. Sample used in Branching after a file system task in SSIS without failing the package shows how to continue package execution even after a particular task fails. This will help you to proceed with package execution even if Foreach Loop fails so you can send email. Blue arrow in the screenshot indicates on completion of previous task.

  10. Sample used in How do I pick the most recently created folder using Foreach loop container in SSIS package? shows how to perform pattern matching.

Hope that gives you an idea.

Flow

Import XML files with different structures using SSIS

So what I did as a solution to this particular case was:
1- add a C# script that reads the xml file and keep the common nodes only and save their values to my dts variables.
2- insert into my SQL Server table the variables I just populated.
Both tasks were in a for each loop to go through all the xml files in a specific directory.
Hope it helps!

Problem validating in SSIS XML Task

I used some very simple examples to run through SSIS and it does not appear to support multiple schema documents.

SSIS reading multiple xml files from folder

Here is a possible option which demonstrates how to load multiple XML files having same definition into an SQL Server table. The example uses SQL Server 2008 R2 and SSIS 2008 R2. The example shown here loads three XML files into an SQL table using SSIS Data Flow Task with the help of XML Source component.

Step-by-step process:

  1. Create a table named dbo.Items using the script given under SQL Scripts section.
  2. Create an XSD file named Items.xsd in the folder path C:\temp\xsd using the content provided under XSD File section.
  3. Create three XML files namely Items_1.xml, Items_2.xml and Items_3.xml in the folder path C:\temp\xml using the content provided under XML Files section.
  4. On the package, create 3 variables namely FileExtension, FilePath and FolderPath as shown in screenshot #1.
  5. On the package's Connection Managers, create an OLE DB Connection named SQLServer to connect to the SQL Server Instance as shown in screenshot #2.
  6. On the Control Flow tab, place a Foreach loop container and a Data Flow Task within the Foreach loop container as shown in screenshot #3.
  7. Configure the Foreach Loop container as shown in screenshots #4 and #5.
  8. Double-click on the Data Flow Task to navigate to the Data Flow tab. Place an XML Source component and an OLE DB Destination as shown in screenshot #6.
  9. Configure the XML Source as shown in screenshot #7 and #8. The XML file path will be retrieved from the variable FilePath. This variable will be populated by the Foreach Loop container. Note: In later versions of Visual Studio, XML Source will error out due to the ValidateExternalMetadata property. To fix this, Single click "XML Source" then Properties and set the ValidateExternalMetadata to False.
  10. Configure the OLE DB Destination as shown in screenshots #9 and #10.
  11. Screenshots #11 and #12 show the package execution.
  12. Screenshot #13 shows the table data before the package execution. Screenshot #14 shows the table data after the package execution. The data in the table dbo.Items now contains the data present in three XML files.

Hope that helps.

SQL Scripts:

CREATE TABLE [dbo].[Items](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ItemNumber] [nvarchar](50) NOT NULL,
[ItemName] [nvarchar](60) NOT NULL,
[Price] [numeric](18, 2) NOT NULL,
CONSTRAINT [PK_Items] PRIMARY KEY CLUSTERED ([Id] ASC)) ON [PRIMARY]
GO

XSD File

<xsd:schema xmlns:schema="ItemsXSDSchema" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes" targetNamespace="ItemsXSDSchema" elementFormDefault="qualified">
<xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes" schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd" />
<xsd:element name="Items">
<xsd:complexType>
<xsd:sequence>
<xsd:element minOccurs="0" maxOccurs="unbounded" name="Item">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Id" type="sqltypes:int" />
<xsd:element name="ItemNumber">
<xsd:simpleType>
<xsd:restriction base="sqltypes:nvarchar" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="20" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="ItemName">
<xsd:simpleType>
<xsd:restriction base="sqltypes:nvarchar" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="60" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="Price">
<xsd:simpleType>
<xsd:restriction base="sqltypes:numeric">
<xsd:totalDigits value="18" />
<xsd:fractionDigits value="2" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>

XML Files

Items_1.xml

<?xml version="1.0"?>
<Items xmlns="ItemsXSDSchema">
<Item>
<Id>1</Id>
<ItemNumber>I2345343</ItemNumber>
<ItemName>Monitor</ItemName>
<Price>299.99</Price>
</Item>
</Items>

Items_2.xml

<?xml version="1.0"?>
<Items xmlns="ItemsXSDSchema">
<Item>
<Id>1</Id>
<ItemNumber>J1231231</ItemNumber>
<ItemName>Mouse</ItemName>
<Price>29.99</Price>
</Item>
</Items>

Items_3.xml

<?xml version="1.0"?>
<Items xmlns="ItemsXSDSchema">
<Item>
<Id>1</Id>
<ItemNumber>K0456212</ItemNumber>
<ItemName>Keyboard</ItemName>
<Price>49.99</Price>
</Item>
</Items>

Screenshot #1:

1

Screenshot #2:

2

Screenshot #3:

3

Screenshot #4:

4

Screenshot #5:

5

Screenshot #6:

6

Screenshot #7:

7

Screenshot #8:

8

Screenshot #9:

9

Screenshot #10:

10

Screenshot #11:

11

Screenshot #12:

12

Screenshot #13:

13

Screenshot #14:

13

Importing and parsing a large XML file in SQL Server (when normal methods are rather slow)

OK. I created an XML Index on the XML data column. (Just a primary one for now).

A query that took ~4:30 minutes before takes ~9 seconds now! Seems that a table that stores the XML with a proper XML Index and the parsing the data with the xml.nodes() function are a feasible solution.

Thank you all.

What is the largest XML file SSIS can extract data from?

Xml Source does not load the whole document in memory, but streams data out as it reads it from XML file. So if you are reading the XML and writing it to e.g. text files without complex transformations, you need relatively little memory. Also the amount of memory you need (after some threshold) stops growing when XML file grows - so you may handle potentially unlimited XML files.

E.g. this guy exported the whole Wikipedia content (20Gb XML file):
http://www.ideaexcursion.com/2009/01/26/import-wikipedia-articles-into-sql-server-with-ssis/

Of course, you will probably do something with that data, e.g. join multiple streams coming out of the XML Source. Depending on what you need, you might need a lot of memory, because some transforms do keep the whole dataset in memory, or perform much better if you have enough memory for the whole dataset.



Related Topics



Leave a reply



Submit