Import 'xml' into Sql Server
Try this:
DECLARE @XML XML = '<EventSchedule>
<Event Uid="2" Type="Main Event">
<IsFixed>True</IsFixed>
<EventKind>MainEvent</EventKind>
<Fields>
<Parameter Name="Type" Value="TV_Show"/>
<Parameter Name="Name" Value="The Muppets"/>
<Parameter Name="Duration" Value="00:30:00"/>
</Fields>
</Event>
<Event Uid="3" Type="Secondary Event">
<IsFixed>True</IsFixed>
<EventKind>SecondaryEvent</EventKind>
<Fields>
<Parameter Name="Type" Value="TV_Show"/>
<Parameter Name="Name" Value="The Muppets II"/>
<Parameter Name="Duration" Value="00:30:00"/>
</Fields>
</Event>
</EventSchedule>'
SELECT
EventUID = Events.value('@Uid', 'int'),
EventType = Events.value('@Type', 'varchar(20)'),
EventIsFixed =Events.value('(IsFixed)[1]', 'varchar(20)'),
EventKind =Events.value('(EventKind)[1]', 'varchar(20)')
FROM
@XML.nodes('/EventSchedule/Event') AS XTbl(Events)
Gives me an output of:
And of course, you can easily do an
INSERT INTO dbo.YourTable(EventUID, EventType, EventIsFixed, EventKind)
SELECT
......
to insert that data into a relational table.
Update: assuming you have your XML in files - you can use this code to load the XML file into an XML
variable in SQL Server:
DECLARE @XmlFile XML
SELECT @XmlFile = BulkColumn
FROM OPENROWSET(BULK 'path-to-your-XML-file', SINGLE_BLOB) x;
and then use the above code snippet to parse the XML.
Update #2: if you need the parameters, too - use this XQuery statement:
SELECT
EventUID = Events.value('@Uid', 'int'),
EventType = Events.value('@Type', 'varchar(20)'),
EventIsFixed = Events.value('(IsFixed)[1]', 'varchar(20)'),
EventKind = Events.value('(EventKind)[1]', 'varchar(20)'),
ParameterType = Events.value('(Fields/Parameter[@Name="Type"]/@Value)[1]', 'varchar(20)'),
ParameterName = Events.value('(Fields/Parameter[@Name="Name"]/@Value)[1]', 'varchar(20)'),
ParameterDuration = Events.value('(Fields/Parameter[@Name="Duration"]/@Value)[1]', 'varchar(20)')
FROM
@XML.nodes('/EventSchedule/Event') AS XTbl(Events)
Results in:
Fastest method to import large XML into SQL-Server table
Dealing with NULL values is something special in XML.
The definition of a NULL value in XML is not existing. So
<a>
<b>hi</b>
<c></c>
<d/>
</a>
<a>
is the root elelement.<b>
is an element with atext()
node.<c>
is an empty element<d>
is a self-closing element<e>
is - uhm - not there...
Important hint: <c>
and <d>
are the same, absolutely no difference!
You can query for the element with
.value('(/a/b)[1]','nvarchar(100)')
And you can query for the the text()
node specifically
.value('(/a/b/text())[1]','nvarchar(100)')
In this you find a possible answer (a bit hidden): You can do all your code without the NULL checking predicate, if you query for the text()
node specifically.
Change this
ref.value('record[1][not(@xs:nil = "true")]' ,'varchar(100)')
to this
ref.value('(record[1]/text())[1]' ,'varchar(100)')
What might break this: If a <record>
's content may be an empty string you would get a NULL
back and not ''
. But it should be much faster... Hope, this is okay for you...
About performance: Read this answer. It covers your issue quite well. Especially the part where the time is consumed (follow the links in this answer).
Import Multi-level (element) XML file into MS SQL Server Table
You can build an insert statement around the SELECT below. There may be a better way to do this, but this can at least move you forward a bit.
declare @x as xml;
set @x = '<?xml version="1.0" encoding="UTF-8"?>
<TagValidationList>
<TVLHeader>
<SubmissionType>STVL</SubmissionType>
<SubmissionDateTime>2000-01-00T00:00:01Z</SubmissionDateTime>
<SSIOPHubID>0001</SSIOPHubID>
<HomeAgencyID>1002</HomeAgencyID>
<BulkIndicator>B</BulkIndicator>
<BulkIdentifier>100</BulkIdentifier>
<RecordCount>3</RecordCount>
</TVLHeader>
<TVLDetail>
<TVLTagDetails>
<HomeAgencyID>1234</HomeAgencyID>
<TagAgencyID>1100</TagAgencyID>
<TagSerialNumber>00123456</TagSerialNumber>
<TagStatus>X</TagStatus>
<TagClass>1</TagClass>
<TVLAccountDetails/>
</TVLTagDetails>
<TVLTagDetails>
<HomeAgencyID>2234</HomeAgencyID>
<TagAgencyID>1200</TagAgencyID>
<TagSerialNumber>00223456</TagSerialNumber>
<TagStatus>Y</TagStatus>
<TagClass>2</TagClass>
<TVLPlateDetails>
<PlateCountry>US</PlateCountry>
<PlateState>TX</PlateState>
<PlateNumber>123ABC</PlateNumber>
<PlateEffectiveFrom>2008-03-12T06:00:00Z</PlateEffectiveFrom>
</TVLPlateDetails>
<TVLAccountDetails/>
</TVLTagDetails>
<TVLTagDetails>
<HomeAgencyID>3234</HomeAgencyID>
<TagAgencyID>1300</TagAgencyID>
<TagSerialNumber>12345678</TagSerialNumber>
<TagStatus>Z</TagStatus>
<TagClass>3</TagClass>
<TVLPlateDetails>
<PlateCountry>US</PlateCountry>
<PlateState>OK</PlateState>
<PlateNumber>ABC321</PlateNumber>
</TVLPlateDetails>
<TVLAccountDetails>
<AccountNumber>654321</AccountNumber>
</TVLAccountDetails>
</TVLTagDetails>
</TVLDetail>
</TagValidationList>';
SELECT
t.x.value('HomeAgencyID[1]', 'varchar(4)') as HomeAgencyID,
t.x.value('TagAgencyID[1]', 'varchar(4)') as TagAgencyID,
t.x.value('TagSerialNumber[1]', 'varchar(8)') as TagSerialNumber,
t.x.value('TagStatus[1]', 'varchar(4)') as TagStatus,
t.x.value('TagClass[1]', 'varchar(4)') as TagClass,
t.x.value('(TVLPlateDetails/PlateCountry)[1]', 'varchar(4)') as PlateCountry,
t.x.value('(TVLPlateDetails/PlateState)[1]', 'varchar(4)') as PlateState,
t.x.value('(TVLPlateDetails/PlateNumber)[1]', 'varchar(12)') as PlateNumber,
t.x.value('(TVLAccountDetails/AccountNumber)[1]', 'varchar(12)') as AccountNumber
FROM @x.nodes('/TagValidationList/TVLDetail/TVLTagDetails') t(x)
The output is close to what you're looking for. You can ISNULL-'' some of the outputs if you need to.
HomeAgencyID TagAgencyID TagSerialNumber TagStatus TagClass PlateCountry PlateState PlateNumber AccountNumber
------------ ----------- --------------- --------- -------- ------------ ---------- ------------ -------------
1234 1100 00123456 X 1 NULL NULL NULL NULL
2234 1200 00223456 Y 2 US TX 123ABC NULL
3234 1300 12345678 Z 3 US OK ABC321 654321
SQL Server: how import XML with node's fields
You can also do it with native XQuery support in SQL Server, without having to resort to the legacy OPENXML
calls which are known for memory leaks and other problem.
Just use this code instead:
DECLARE @XML AS XML
SELECT @XML = XMLData FROM XMLwithOpenXML
SELECT
id = xc.value('@id', 'int'),
lon = xc.value('@lon', 'decimal(20,8)'),
lat = xc.value('@lat', 'decimal(20,8)'),
name = xc.value('(tag[@k="name"]/@v)[1]', 'varchar(50)'),
website = xc.value('(tag[@k="website"]/@v)[1]', 'varchar(50)')
FROM
@XML.nodes('/osm/node') AS XT(XC)
And I also recommend to always use the most appropriate datatype - here, the id
looks like an INT
- so use it as such - and lat
and lon
are clearly DECIMAL
values ; don't just convert everything to strings because you're too lazy to figure out what to use!
Importing XML into SQL Server but trying to make multiple entries if multiple results exist for a child element
This should get you pretty far. It's completely untested, so please read the code, understand it, and make the appropriate changes to get it to work.
I've removed the function
and inlined all the code into the loop instead, the function was too bulky for my taste. Now you should be able to see more clearly what's going on.
Effectively it's the exact same code two times, with a small extra step that adds self-references so you can query every product via its primary ID and and its secondary IDs in the same way, as discussed in the comments.
$connectionString = "Data Source=Apps2\Apps2;Initial Catalog=ICECAT;Integrated Security=SSPI"
$batchSize = 50000
# set up [files_index] datatable & read schema from DB
$files_index_table = New-Object System.Data.DataTable
$files_index_adapter = New-Object System.Data.SqlClient.SqlDataAdapter("SELECT * FROM files_index WHERE 0 = 1", $connectionString)
$files_index_adapter.Fill($files_index_table) | Out-Null
$files_index_bcp = New-Object SqlBulkCopy($connectionString)
$files_index_bcp.DestinationTableName = "dbo.files_index"
$files_index_count = 0
# set up [product_ids] datatable & read schema from DB
$product_ids_table = New-Object System.Data.DataTable
$product_ids_adapter = New-Object System.Data.SqlClient.SqlDataAdapter("SELECT * FROM product_ids WHERE 0 = 1", $connectionString)
$product_ids_adapter.Fill($product_ids_table) | Out-Null
$product_ids_bcp = New-Object System.Data.SqlClient.SqlBulkCopy($connectionString)
$product_ids_bcp.DestinationTableName = "dbo.product_ids"
$product_ids_count = 0
# main import loop
$xmlReader = New-Object System.Xml.XmlTextReader("C:\Scripts\icecat\files.index.xml")
while ($xmlReader.Read()) {
# skip any XML nodes that aren't elements
if ($xmlReader.NodeType -ne [System.Xml.XmlNodeType]::Element) { continue }
# handle <file> elements
if ($xmlReader.Name -eq "file") {
$files_index_count++
# remember current product ID, we'll need it when we hit the next <M_Prod_ID> element
$curr_product_id = $xmlReader.GetAttribute("Product_ID")
$is_new_file = $true
$newRow = $files_index_table.NewRow()
$newRow["Product_ID"] = $xmlReader.GetAttribute("Product_ID")
$newRow["path"] = $xmlReader.GetAttribute("path")
$newRow["Updated"] = $xmlReader.GetAttribute("Updated")
$newRow["Quality"] = $xmlReader.GetAttribute("Quality")
$newRow["Supplier_id"] = $xmlReader.GetAttribute("Supplier_id")
$newRow["Prod_ID"] = $xmlReader.GetAttribute("Prod_ID")
$newRow["Catid"] = $xmlReader.GetAttribute("Catid")
$newRow["On_Market"] = $xmlReader.GetAttribute("On_Market")
$newRow["Model_Name"] = $xmlReader.GetAttribute("Model_Name")
$newRow["Product_View"] = $xmlReader.GetAttribute("Product_View")
$newRow["HighPic"] = $xmlReader.GetAttribute("HighPic")
$newRow["HighPicSize"] = $xmlReader.GetAttribute("HighPicSize")
$newRow["HighPicWifiles_index_tableh"] = $xmlReader.GetAttribute("HighPicWifiles_index_tableh")
$newRow["HighPicHeight"] = $xmlReader.GetAttribute("HighPicHeight")
$newRow["Date_Added"] = $xmlReader.GetAttribute("Date_Added")
$files_index_table.Rows.Add($newRow) | Out-Null
if ($files_index_table.Rows.Count -eq $batchSize) {
$files_index_bcp.WriteToServer($files_index_table)
$files_index_table.Rows.Clear()
Write-Host "$files_index_count <file> elements processed so far"
}
# handle <M_Prod_ID> elements
} elseif ($xmlReader.Name -eq "M_Prod_ID") {
$product_ids_count++
# add self-reference row to the [product_ids] table
# only for the first <M_Prod_ID> per <file> we need to do this
if ($is_new_file) {
$newRow = $product_ids_table.NewRow()
$newRow["Product_ID"] = $curr_product_id # from above
$newRow["Alternative_ID"] = $curr_product_id
$product_ids_table.Rows.Add($newRow) | Out-Null
$is_new_file = $false
}
$newRow = $product_ids_table.NewRow()
$newRow["Product_ID"] = $curr_product_id # from above
$newRow["Alternative_ID"] = $xmlReader.Value
$product_ids_table.Rows.Add($newRow) | Out-Null
if ($product_ids_table.Rows.Count -eq $batchSize) {
$product_ids_bcp.WriteToServer($files_index_table)
$product_ids_table.Rows.Clear()
Write-Host "$product_ids_count <M_Prod_ID> elements processed so far"
}
}
}
# write any remaining rows to the server
if ($files_index_table.Rows.Count -gt 0) {
$files_index_bcp.WriteToServer($files_index_table)
$files_index_table.Rows.Clear()
}
Write-Host "$files_index_count <file> elements processed overall"
if ($product_ids_table.Rows.Count -gt 0) {
$product_ids_bcp.WriteToServer($product_ids_table)
$product_ids_table.Rows.Clear()
}
Write-Host "$product_ids_count <M_Prod_ID> elements processed overall"
Method for importing System.Data.Dataset XML file into SQL Server using T-SQL
Please try the following solution.
It is using T-SQL and XQuery methods .nodes()
and .value()
.
I saved your XML as 'e:\Temp\NewDataSet.xml' file.
SQL Server XML data type can hold up to 2GB size wise.
If the performance of the suggested method is not that good, depending on the volume of the data, it is possible to load the entire XML file into a temporary table with one row and one column.
SQL
DECLARE @tbl1 TABLE (ID INT IDENTITY PRIMARY KEY, col1 VARCHAR(50), col2 VARCHAR(50));
DECLARE @tbl2 TABLE (ID INT IDENTITY PRIMARY KEY, col1 VARCHAR(50), col2 VARCHAR(50));
DECLARE @xml XML;
SELECT @xml = XmlDoc
FROM OPENROWSET (BULK N'e:\Temp\NewDataSet.xml', SINGLE_BLOB, CODEPAGE='65001') AS Tab(XmlDoc);
INSERT INTO @tbl1 (col1, col2)
SELECT c.value('(col1/text())[1]', 'VARCHAR(50)') AS col1
, c.value('(col2/text())[1]','VARCHAR(50)') AS col2
FROM @xml.nodes('/NewDataSet/table1') AS t(c);
INSERT INTO @tbl2 (col1, col2)
SELECT c.value('(col1/text())[1]', 'VARCHAR(50)') AS col1
, c.value('(col2/text())[1]','VARCHAR(50)') AS col2
FROM @xml.nodes('/NewDataSet/table2') AS t(c);
-- test
SELECT * FROM @tbl1;
SELECT * FROM @tbl2;
Output
Table1
+----+--------+--------+
| ID | col1 | col2 |
+----+--------+--------+
| 1 | tkshrq | 6krrtq |
| 2 | k60stu | sqnhp9 |
+----+--------+--------+
Table2
+----+--------+--------+
| ID | col1 | col2 |
+----+--------+--------+
| 1 | 6k1thw | n2ocgz |
| 2 | 26kmw5 | ym3iwd |
+----+--------+--------+
Related Topics
How to Group by Week in Postgresql
SQL Return Only Duplicate Rows
"Order by ... Using" Clause in Postgresql
Move Data from One Table to Another, Postgresql Edition
Accessing JSON Array in SQL Server 2016 Using JSON_Value
SQL Server 2008: Delete Duplicate Rows
How to Flip a Bit in SQL Server
Cross Apply VS Outer Apply Speed Difference
Add an Incremental Number in a Field in Insert into Select Query in SQL Server
Select Query by Pair of Fields Using an in Clause
Selecting Entries by Date - >= Now(), MySQL
How to Return Second Newest Record in SQL
How to Further Optimize a Derived Table Query Which Performs Better Than the Joined Equivalent
How to Get Column Attributes Query from Table Name Using Postgresql
Make All Store Images the Base, Small and Thumbnail Images in Magento
Update or Insert (Multiple Rows and Columns) from Subquery in Postgresql