Parsing Nested Xml into SQL Table

Parse nested xml using sql server

There were 2 problems with your last line: (1) it should be an OUTER APPLY because a file may have no dependencies and (2) you need to go one level deeper, into DependentOnFileNames/FileGUID instead of DependentOnFileNames. It should be:

OUTER APPLY tbl1.Files.nodes('DependentOnFileNames/FileGUID') tbl2(DependentOnFileNames) 

And modify your SELECT clause accordingly:

tbl2.DependentOnFileNames.value('.', 'UNIQUEIDENTIFIER') DependentFileGUID

For future Googlers, a few tips for dealing with XML in SQL Server:

  • Use nodes() to expand the nested elements at the XML level
  • Use value() to get the value of an XML node. The selector inside value(...) always return an array of values, even if it's an array of 1. You must select only 1 element from that array. Element index starts at 1. The exception being value(.), which returns a singleton.
  • Use CROSS APPLY / OUTER APPLY to expand the nest elements at the SQL level. They act like INNER JOIN / LEFT JOIN, except that there's no need for join conditions.
  • If you get stuck, put something like T1.[FileGroup].query('.') in the SELECT clause and explore the XML structure.

The code:

SELECT          
T1.[FileGroup].value('FileName[1]', 'varchar(30)') AS FileName
, T2.[File].value('FilePath[1]', 'varchar(500)') AS FilePath
, T2.[File].value('FileGUID[1]', 'UNIQUEIDENTIFIER') AS FileGUID
, T3.DependentFileGUID.value('.', 'UNIQUEIDENTIFIER') AS DependentFileGUID
FROM
@xml.nodes('FileGroups/FileGroup') T1([FileGroup])
CROSS APPLY T1.[FileGroup].nodes('Files/File') T2([File])
OUTER APPLY T2.[File].nodes('DependentOnFileNames/FileGUID') T3(DependentFileGUID)

How to convert nested XML into corresponding tables?

Try it like this:

DECLARE @xml XML=
N'<Customers>
<Customer>
<Id>1</Id>
<AccountNumber>12345</AccountNumber>
<Addresses>
<Address>
<Id>100</Id>
<street>my street></street>
<city>London</city>
</Address>
<Address>
<Id>101</Id>
<street>my street></street>
<city>Berlin</city>
</Address>
</Addresses>
<BankDetails>
<BankDetail>
<Id>222</Id>
<Iban>DE8439834934939434333</Iban>
</BankDetail>
<BankDetail>
<Id>228</Id>
<Iban>UK1237921391239123213</Iban>
</BankDetail>
</BankDetails>
<Orders>
<Order>
<OrderLine />
</Order>
</Orders>
</Customer>
</Customers>';

--This query will create a table #tmpInsert with all the data

SELECT cust.value('Id[1]','int') AS CustomerID
,cust.value('AccountNumber[1]','int') AS CustomerAccountNumber
,addr.value('Id[1]','int') AS AddressId
,addr.value('street[1]','nvarchar(max)') AS AddressStreet
,addr.value('city[1]','nvarchar(max)') AS AddressCity
,bank.value('Id[1]','int') AS BankId
,bank.value('Iban[1]','nvarchar(max)') AS BankIban
,ord.value('OrderLine[1]','nvarchar(max)') AS OrderLine
INTO #tmpInsert
FROM @xml.nodes('/Customers/Customer') AS A(cust)
OUTER APPLY cust.nodes('Addresses/Address') AS B(addr)
OUTER APPLY cust.nodes('BankDetails/BankDetail') AS C(bank)
OUTER APPLY cust.nodes('Orders/Order') AS D(ord);

--Here you can check the content

SELECT * FROM #tmpInsert;

--Clean-Up

GO
DROP TABLE #tmpInsert

Once you've got all your data in the table, you can use simple DISTINCT, GROUP BY, if needed ROW_NUMBER() OVER(PARTITION BY ...) to select each set separately for the proper insert.

Parsing nested XML into SQL table

You can make use of a numbers table to pick the first, second, third etc row from the child elements. In this query I have limited the rows returned to the number if dates provided. If there are more values or descriptions than dates you have to modify the join to take that into account.

declare @XML xml = '
<ObjectData>
<Parameter1>some value</Parameter1>
<Parameter2>other value</Parameter2>
<Dates>
<dateTime>2011-02-01T00:00:00</dateTime>
<dateTime>2011-03-01T00:00:00</dateTime>
<dateTime>2011-04-01T00:00:00</dateTime>
</Dates>
<Values>
<double>0.019974</double>
<double>0.005395</double>
<double>0.004854</double>
</Values>
<Description>
<string>this is row 1</string>
<string>this is row 2</string>
<string>this is row 3</string>
</Description>
</ObjectData>'

;with Numbers as
(
select number
from master..spt_values
where type = 'P'
)
select T.N.value('Parameter1[1]', 'varchar(50)') as Parameter1,
T.N.value('Parameter2[1]', 'varchar(50)') as Parameter2,
T.N.value('(Dates/dateTime[position()=sql:column("N.Number")])[1]', 'datetime') as Dates,
T.N.value('(Values/double[position()=sql:column("N.Number")])[1]', 'float') as [Values],
T.N.value('(Description/string[position()=sql:column("N.Number")])[1]', 'varchar(max)') as [Description]
from @XML.nodes('/ObjectData') as T(N)
cross join Numbers as N
where N.number between 1 and (T.N.value('count(Dates/dateTime)', 'int'))

Parse nested XML in T-SQL need parent and childs in single row

The correct way to query nested nodes is a cascade of .nodes() called by APPLY:

Your sample XML:

DECLARE @XML XML=
N'<logs>
<Event>
<DriverId>51</DriverId>
<EventID>b31ade0f-1053-4df4-a9dd-ffc76060f3c5</EventID>
<Records>
<Record>
<RecordID>b31ade0f-1053-4df4-a9dd-ffc76060f3c5</RecordID>
</Record>
</Records>
</Event>
<Event>
<DriverId>45</DriverId>
<EventID>3b454377-74c7-4ea2-909e-3ea239b969b3</EventID>
<Records>
<Record>
<RecordID>3b454377-74c7-4ea2-909e-3ea239b969b3</RecordID>
</Record>
<Record>
<RecordID>3b454377-74c7-4ea2-909e-3ea239b969b4</RecordID>
</Record>
<Record>
<RecordID>3b454377-74c7-4ea2-909e-3ea239b969b5</RecordID>
</Record>
</Records>
</Event>
</logs>';

--The query

SELECT A.evnt.value('(DriverId/text())[1]','int') AS Event_DriverId
,A.evnt.value('(EventID/text())[1]','uniqueidentifier') AS Event_EventId
,B.rec.value('(RecordID/text())[1]','uniqueidentifier') AS Record_RecordId
FROM @XML.nodes('/logs/Event') A(evnt)
OUTER APPLY A.evnt.nodes('Records/Record') B(rec);

The result

Event_DriverId  Event_EventId                           Record_RecordId
51 B31ADE0F-1053-4DF4-A9DD-FFC76060F3C5 B31ADE0F-1053-4DF4-A9DD-FFC76060F3C5
45 3B454377-74C7-4EA2-909E-3EA239B969B3 3B454377-74C7-4EA2-909E-3EA239B969B3
45 3B454377-74C7-4EA2-909E-3EA239B969B3 3B454377-74C7-4EA2-909E-3EA239B969B4
45 3B454377-74C7-4EA2-909E-3EA239B969B3 3B454377-74C7-4EA2-909E-3EA239B969B5

The idea in short:

We use .nodes() to get each <Event> as a separate row.

Now we use .nodes() again, but we do this with the XML-fragment returned by the first call and we use a relative path (no slash at the beginning).

The second .nodes() returns each <Record> within each <Event> as a separate row.

Just if interested: This answer shows, why we should never use backward-navigation (using ../ in the XPath).

Extract nested XML data with inconsistent attributres and elements in SQL Server

The whole thing depends on what you know in advance.:

A mockup to simulate your issue:

DECLARE @YourTable TABLE(ID INT IDENTITY, [Data] XML);
INSERT INTO @YourTable VALUES
(N'<risk>
<Input_CCY style="Input">USD</Input_CCY>
<Input_LastYear style="Output" formula="=SUM(E65:E68)">500</Input_LastYear>
<table name="info" header_row="1">
<Row name="" iRow="2">
<Col style="Input" name="Ref" iCol="1" >AAA</Col>
<Col style="Input" name="Location" iCol="2">London</Col>
...
</Row>
<Row name="" iRow="3">
<Col style="Input" name="Ref" iCol="1" >BBB</Col>
<Col style="Input" name="Location" iCol="2">Edinburgh</Col>
...
</Row>
...
</table>
<table name="summary" header_row="1">
<Row name="" iRow="2">
<Col style="Output" name="Status" iCol="1" >Amber</Col>
<Col style="Output" name="Referral_Bonus" iCol="2">No</Col>
...
</Row>
<Row name="" iRow="3">
<Col style="Normal" name="Status" iCol="1" >Green</Col>
<Col style="Normal" name="Referral_Bonus" iCol="2">YES</Col>
...
</Row>
...
</table>
</risk>');

--This query will return the columns as EAV.

--This is good, if you do not know the tables and columns in advance.

--I would suggest to load the returned set into a staging table and proceed from there:

SELECT t.ID
,t.[Data].value('(/risk/Input_CCY/@style)[1]','nvarchar(100)') AS Input_CCY_Style
,t.[Data].value('(/risk/Input_CCY/text())[1]','nvarchar(100)') AS Input_CCY_Content
,t.[Data].value('(/risk/Input_LastYear/@style)[1]','nvarchar(100)') AS Input_LastYear_Style
,t.[Data].value('(/risk/Input_LastYear/@formula)[1]','nvarchar(100)') AS Input_LastYear_Formula
,t.[Data].value('(/risk/Input_LastYear/text())[1]','nvarchar(100)') AS Input_LastYear_Content
,tbl.value('@name','nvarchar(250)') AS Table_Name
,rw.value('@iRow','int') AS Row_iRow
,cl.value('@name','nvarchar(250)') AS [Col_Name]
,cl.value('@iCol','nvarchar(250)') AS Col_iCol
,cl.value('text()[1]','nvarchar(150)') AS Col_Content
FROM @YourTable t
CROSS APPLY t.[Data].nodes('/risk/table') A(tbl)
CROSS APPLY tbl.nodes('Row') B(rw)
CROSS APPLY rw.nodes('Col') C(cl);

--This query will pick the columns by known names.

--As I put it here, you will get everything, but all columns return as NULL, if they do not exist in the actual table.

SELECT t.ID
,t.[Data].value('(/risk/Input_CCY/@style)[1]','nvarchar(100)') AS Input_CCY_Style
,t.[Data].value('(/risk/Input_CCY/text())[1]','nvarchar(100)') AS Input_CCY_Content
,t.[Data].value('(/risk/Input_LastYear/@style)[1]','nvarchar(100)') AS Input_LastYear_Style
,t.[Data].value('(/risk/Input_LastYear/@formula)[1]','nvarchar(100)') AS Input_LastYear_Formula
,t.[Data].value('(/risk/Input_LastYear/text())[1]','nvarchar(100)') AS Input_LastYear_Content
,tbl.value('@name','nvarchar(250)') AS Table_Name
,rw.value('@iRow','int') AS Row_iRow

,rw.value('(Col[@name="Ref"]/@iCol)[1]','nvarchar(150)') AS Col_Ref_iCol
,rw.value('(Col[@name="Ref"]/@style)[1]','nvarchar(150)') AS Col_Ref_Style
,rw.value('(Col[@name="Ref"]/text())[1]','nvarchar(150)') AS Col_Ref_Content

,rw.value('(Col[@name="Location"]/@iCol)[1]','nvarchar(150)') AS Col_Location_iCol
,rw.value('(Col[@name="Location"]/@style)[1]','nvarchar(150)') AS Col_Location_Style
,rw.value('(Col[@name="Location"]/text())[1]','nvarchar(150)') AS Col_Location_Content

,rw.value('(Col[@name="Status"]/@iCol)[1]','nvarchar(150)') AS Col_Status_iCol
,rw.value('(Col[@name="Status"]/@style)[1]','nvarchar(150)') AS Col_Status_Style
,rw.value('(Col[@name="Status"]/text())[1]','nvarchar(150)') AS Col_Status_Content

,rw.value('(Col[@name="Referral_Bonus"]/@iCol)[1]','nvarchar(150)') AS Col_Referral_Bonus_iCol
,rw.value('(Col[@name="Referral_Bonus"]/@style)[1]','nvarchar(150)') AS Col_Referral_Bonus_Style
,rw.value('(Col[@name="Referral_Bonus"]/text())[1]','nvarchar(150)') AS Col_Referral_Bonus_Content
FROM @YourTable t
CROSS APPLY t.[Data].nodes('/risk/table') A(tbl)
CROSS APPLY tbl.nodes('Row') B(rw);

If you know all your tables and columns in advance, I'd suggest to take the second appraoch, but one statement per table separately.

Inserting into tables using nested XML

EDIT: With generated IDs

This solution relys on your XML being loaded into a variable @xml of type XML. If you need help with this, just call ;-)

WITH MyCompanies AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS CompID
,Comp.value('name[1]','varchar(max)') AS Comp_Name
,Comp.query('.') AS Comp_InnerXML
FROM @xml.nodes('/root/companies/company') AS A(Comp)
)
,TheAddresses AS
(
SELECT MyCompanies.*
,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS AddrID
,Addr.value('street[1]','varchar(max)') AS Addr_Street
,Addr.query('.') AS Addr_InnerXML
FROM MyCompanies
OUTER APPLY Comp_InnerXML.nodes('company/addresses/address') AS A(Addr)
)
SELECT TheAddresses.*
,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS ContID
,Cont.value('fullName[1]','varchar(max)') AS Cont_FullName
INTO #temp
FROM TheAddresses
OUTER APPLY Addr_InnerXML.nodes('address/contacts/contact') AS C(Cont);

SELECT * FROM #temp;

The result

+--------+-----------+--------+-----------------+--------+---------------+
| CompID | Comp_Name | AddrID | Addr_Street | ContID | Cont_FullName |
+--------+-----------+--------+-----------------+--------+---------------+
| 1 | XYZ Co. | 1 | 12 Light Street | 1 | NULL |
+--------+-----------+--------+-----------------+--------+---------------+
| 1 | XYZ Co. | 2 | 44 King Street | 2 | Bob |
+--------+-----------+--------+-----------------+--------+---------------+
| 1 | XYZ Co. | 2 | 44 King Street | 3 | Alice |
+--------+-----------+--------+-----------------+--------+---------------+
| 2 | ABC Co. | 3 | 12 ABC Street | 4 | NULL |
+--------+-----------+--------+-----------------+--------+---------------+
| 2 | ABC Co. | 4 | 44 ABC Street | 5 | Bob ABC |
+--------+-----------+--------+-----------------+--------+---------------+
| 2 | ABC Co. | 4 | 44 ABC Street | 6 | Alice ABC |
+--------+-----------+--------+-----------------+--------+---------------+

Within #temp you have the data. Use DISTINCT to do your inserts from there...

--Your companies
SELECT DISTINCT CompID,Comp_Name FROM #temp;
--Your address
SELECT DISTINCT CompID,AddrID,Addr_Street FROM #temp;
--Your contacts
SELECT DISTINCT AddrID,ContID,Cont_FullName FROM #temp;

previous

This query will get the data into table data.

But: You should insert this into related tables (company,address,contacts) rather an all in one.

You can create dummy IDs using ROW_NUMBER with OVER(PARTITION BY ...) to generate dummy IDs but this will work only in the first run.

SELECT Comp.value('name[1]','varchar(max)') AS Comp_Name
,Addr.value('street[1]','varchar(max)') AS Addr_Street
,Cont.value('fullName[1]','varchar(max)') AS Cont_FullName
FROM @xml.nodes('/root/companies/company') AS A(Comp)
OUTER APPLY Comp.nodes('addresses/address') AS B(Addr)
OUTER APPLY Addr.nodes('contacts/contact') AS C(Cont)

The result

Comp_Name   Addr_Street     Cont_FullName
XYZ Co. 12 Light Street NULL
XYZ Co. 44 King Street Bob
XYZ Co. 44 King Street Alice
ABC Co. 12 ABC Street NULL
ABC Co. 44 ABC Street Bob ABC
ABC Co. 44 ABC Street Alice ABC

Working with nested XML nodes in SQL Server

These two lines from your FROM clause are cross-joining on each other:

OUTER APPLY PersonDetails.nodes('../HomeInformation/Address') HomeInformation([Address])
OUTER APPLY PersonDetails.nodes('../HomeInformation/Address/AddressLines') HomeInformationAddress(AddressLines)

You need to make the second one dependent on the first one to prevent this:

OUTER APPLY PersonDetails.nodes('../HomeInformation/Address') HomeInformation([Address])
OUTER APPLY HomeInformation.nodes('../AddressLines') HomeInformationAddress(AddressLines)

Inserting XML Multiple Nested Nodes into SQL Tables

You can use the following conceptual example for your needs.

You can INSERT data into a DB table by iterating through the parent XML elements. You would need two INSERT statements. Right after the first of them, you would need to preserve an IDENTITY value like you are already doing it today, to be used in the 2nd INSERT for child XML fragments and corresponding rows.

SQL

DECLARE @INPUTXML XML = '<Lineitems>
<Lineitem>
<Ticker>TER</Ticker>
<ID>0</ID>
<LineItem>Net Revenue</LineItem>
<XFundCode>TRIN</XFundCode>
<UserID>TDP</UserID>
</Lineitem>
<Lineitem>
<Ticker>TER</Ticker>
<ID>0</ID>
<LineItem>Cost of Revenue</LineItem>
<XFundCode>XXP</XFundCode>
<UserID>TDP</UserID>
</Lineitem>
</Lineitems>';

DECLARE @Ticker AS VARCHAR(MAX)
, @ID AS INT
, @Lineitem AS VARCHAR(MAX)
, @XFundCode AS VARCHAR(MAX)
, @UserID AS VARCHAR(MAX);

-- count total number of items
DECLARE @i INT
, @cnt INT = @INPUTXML.value('count(/Lineitems/Lineitem)', 'INT');
SELECT @cnt;

-- loop XML item by item
SET @i = 1;
WHILE @i <= @cnt BEGIN

SELECT @Ticker = col.value('(Ticker/text())[1]','VARCHAR(MAX)')
, @ID = col.value('(ID/text())[1]','INT')
, @Lineitem = col.value('(LineItem/text())[1]','VARCHAR(MAX)')
, @XFundCode = col.value('(XFundCode/text())[1]','VARCHAR(MAX)')
, @UserID = col.value('(UserID/text())[1]','VARCHAR(MAX)')
FROM @INPUTXML.nodes('/Lineitems/Lineitem[position() = sql:variable("@i")]') AS tab(col)

-- do whatever needed in the loop here

SET @i += 1;
END


Related Topics



Leave a reply



Submit