How to Parse Very Huge Xml Files in C#

How to parse very huge XML Files in C#?

Use XML reader instead of XML dom. XML dom stores the whole file in memory which is totally useless:

http://msdn.microsoft.com/en-us/library/system.xml.xmlreader.aspx

C# and Reading Large XML Files

The answer to this question hasn't changed in .NET 4 - for best performance you should still be using XmlReader as it streams the document instead of loading the full thing into memory.

The code you refer to uses XmlReader for the actual querying so should be reasonably quick on large documents.

Reading large XML files with C#

"Before you downrate this, you might want to check google because I
did research, but found nothing"

You found nothing because you don't know what you are searching for, also your XML is invalid, you need to enclose it in a rootElement. Then the first thing you need to do is read this file from the desktop if it exists.

You can check the size if you wish at that time and determine if this is "too large" even though it doesn't really matter. I highly doubt your XML file will be 5+ GB in size. If it is then you need an alternative, no single object in a .Net program may be over 2GB, the best you could do is 1,073,741,823 on a 64bit machine.

For very large XML files, anything above 1.0 GB, combine XmlReader and LINQ as stated by Jon Skeet here:

If your document is particularly huge, you can combine XmlReader and
LINQ to XML by creating an XElement from an XmlReader for each of your
"outer" elements in a streaming manner: this lets you do most of the
conversion work in LINQ to XML, but still only need a small portion of
the document in memory at any one time.

For small XML files, anything 1.0 GB or lower stick to the DOM as shown below.

With that said, what you need is to learn what Serialization and Deserialization mean.

Serialize convert an object instance to an XML document.

Deserialize convert an XML document into an object instance.

Instead of XML you can also use JSON, binary, etc.

In your case this is what can be done to Deserialize this XML document back into an Object in order for you to use in your code.

First fix up the XML and give it a Root.

<?xml version="1.0" encoding="UTF-8"?>
<DataRoot>
<smallusers>
<user id="1">
<name>John</name>
<motto>I am john, who are you?</motto>
</user>
<user id="2">
<name>Peter</name>
<motto>Hello everyone!</motto>
</user>
</smallusers>
<bigusers>
<user id="3">
<name>Barry</name>
<motto>Earth is awesome</motto>
</user>
</bigusers>
</DataRoot>

Then create the root class in C#, you may generate this directly in Visual Studio 2012+ by copying your XML and going to Edit - Paste Special, but I like to use: XML to C# Class Generator

Here is what your code would look like after you generate the C# Root Class for your XML, hope it helps you understand it better.

using System;
using System.Collections.Generic;
using System.IO;
using System.Xml;
using System.Xml.Serialization;

namespace ConsoleApplication1
{
public class Program
{
[XmlRoot(ElementName = "user")]
public class User
{
[XmlElement(ElementName = "name")]
public string Name { get; set; }
[XmlElement(ElementName = "motto")]
public string Motto { get; set; }
[XmlAttribute(AttributeName = "id")]
public string Id { get; set; }
}

[XmlRoot(ElementName = "smallusers")]
public class Smallusers
{
[XmlElement(ElementName = "user")]
public List<User> User { get; set; }
}

[XmlRoot(ElementName = "bigusers")]
public class Bigusers
{
[XmlElement(ElementName = "user")]
public User User { get; set; }
}

[XmlRoot(ElementName = "DataRoot")]
public class DataRoot
{
[XmlElement(ElementName = "smallusers")]
public Smallusers Smallusers { get; set; }
[XmlElement(ElementName = "bigusers")]
public Bigusers Bigusers { get; set; }
}

static void Main(string[] args)
{
string testXMLData = @"<DataRoot><smallusers><user id=""1""><name>John</name><motto>I am john, who are you?</motto></user><user id=""2""><name>Peter</name><motto>Hello everyone!</motto></user></smallusers><bigusers><user id=""3""><name>Barry</name><motto>Earth is awesome</motto></user></bigusers></DataRoot>";

var fileXmlData = File.ReadAllText(@"C:\XMLFile.xml");
var deserializedObject = DeserializeFromXML(fileXmlData);
var serializedToXML = SerializeToXml(deserializedObject);

//I want to store each user, but still detect if their small or big, is there a way to do this?
foreach (var smallUser in deserializedObject.Smallusers.User)
{
//Iterating your collection of Small users?
//Do what you need here with `smalluser`.
var name = smallUser.Name; //Example...
}

Console.WriteLine(serializedToXML);
Console.ReadKey();
}

public static string SerializeToXml(DataRoot DataObject)
{
var xsSubmit = new XmlSerializer(typeof(DataRoot));

using (var sw = new StringWriter())
{
using (var writer = XmlWriter.Create(sw))
{
xsSubmit.Serialize(writer, DataObject);
var data = sw.ToString();
writer.Flush();
writer.Close();
sw.Flush();
sw.Close();
return data;
}
}
}

public static DataRoot DeserializeFromXML(string xml)
{
var xsExpirations = new XmlSerializer(typeof(DataRoot));
DataRoot rootDataObj = null;
using (TextReader reader = new StringReader(xml))
{
rootDataObj = (DataRoot)xsExpirations.Deserialize(reader);
reader.Close();
}
return rootDataObj;
}
}
}

What is the best way to parse (big) XML in C# Code?

Use XmlReader to parse large XML documents. XmlReader provides fast, forward-only, non-cached access to XML data. (Forward-only means you can read the XML file from beginning to end but cannot move backwards in the file.) XmlReader uses small amounts of memory, and is equivalent to using a simple SAX reader.

    using (XmlReader myReader = XmlReader.Create(@"c:\data\coords.xml"))
{
while (myReader.Read())
{
// Process each node (myReader.Value) here
// ...
}
}

You can use XmlReader to process files that are up to 2 gigabytes (GB) in size.

Ref: How to read XML from a file by using Visual C#

How to parse large XML files with xpaths?

I'm using SelectNodes to cut the xml document in small elements

XmlDocument doc = new XmlDocument();
XmlNodeList nodeList = doc.SelectNodes(".....");
if (nodeList != null)
{
foreach (XmlNode node in nodeList)


Related Topics



Leave a reply



Submit