Deciding on when to use XmlDocument vs XmlReader
I've generally looked at it not from a fastest perspective, but rather from a memory utilization perspective. All of the implementations have been fast enough for the usage scenarios I've used them in (typical enterprise integration).
However, where I've fallen down, and sometimes spectacularly, is not taking into account the general size of the XML I'm working with. If you think about it up front you can save yourself some grief.
XML tends to bloat when loaded into memory, at least with a DOM reader like XmlDocument
or XPathDocument
. Something like 10:1? The exact amount is hard to quantify, but if it's 1MB on disk it will be 10MB in memory, or more, for example.
A process using any reader that loads the whole document into memory in its entirety (XmlDocument
/XPathDocument
) can suffer from large object heap fragmentation, which can ultimately lead to OutOfMemoryException
s (even with available memory) resulting in an unavailable service/process.
Since objects that are greater than 85K in size end up on the large object heap, and you've got a 10:1 size explosion with a DOM reader, you can see it doesn't take much before your XML documents are being allocated from the large object heap.
XmlDocument
is very easy to use. Its only real drawback is that it loads the whole XML document into memory to process. Its seductively simple to use.
XmlReader
is a stream based reader so will keep your process memory utilization generally flatter but is more difficult to use.
XPathDocument
tends to be a faster, read-only version of XmlDocument, but still suffers from memory 'bloat'.
XmlDocument VS XmlReader
The question shouldn't be which is faster but which is good for your case.
XmlDocument loads entire document into memory and allows you to modify it and query the content. After all you can save modified document back to file.
XmlReader provides read only and forward only access to the content of XML document, one element at the time.
You have to choose which description fits into your case.
You should also be aware that there is another way to handle XML documents in .NET, called LINQ to XML.
Performance: XmlSerializer vs XmlReader vs XmlDocument vs XDocument
As for the performance of XmlSerializer, see http://msdn.microsoft.com/en-us/library/182eeyhh.aspx which says:
The XmlSerializer creates C# files and
compiles them into .dll files to
perform this serialization. In .NET
Framework 2.0, the XML Serializer
Generator Tool (Sgen.exe) is designed
to generate these serialization
assemblies in advance to be deployed
with your application and improve
startup performance.
So you can increase performance of XmlSerializer by making use of the sgen tool http://msdn.microsoft.com/en-us/library/bk3w6240.aspx, that way you can avoid the performance hit you get when new XmlSerializer() creates and compiles C# files.
XmlTextReader vs. XDocument
If you're happy reading everything into memory, use XDocument
. It'll make your life much easier. LINQ to XML is a lovely API.
Use an XmlReader
(such as XmlTextReader
) if you need to handle huge XML files in a streaming fashion, basically. It's a much more painful API, but it allows streaming (i.e. only dealing with data as you need it, so you can go through a huge document and only have a small amount in memory at a time).
There's a hybrid approach, however - if you have a huge document made up of small elements, you can create an XElement
from an XmlReader
positioned at the start of the element, deal with the element using LINQ to XML, then move the XmlReader
onto the next element and start again.
XML parsing differences in .NET Framework 2.0, 3.0, 3.5 and 4.0
you may have a look at specific answers here:
Deciding on when to use XmlDocument vs XmlReader
As you can see it is not only a question of performance but memory usage as well. see the experts' anwers with runtimes in yellow
What is the fastest/most efficient way to read this XML to Dictionary (Linq or something else?)
When are are using a event based interface, similar to the one presented in your update, you will need to remember the name of the previous start element event. Often it is worth while holding a stack to keep track of the events. I would probably do something similar to the following:
public class PriceLevel
{
private decimal? bid = null;
private decimal? offer = null;
public decimal? Bid {
get { return bid; }
set { bid = value; }
}
public decimal? Offer {
get { return offer; }
set { offer = value; }
}
}
public delegate void OnPriceChange(long instrumentId, Dictionary<decimal, PriceLevel> prices);
public class MainClass
{
private Stack<String> xmlStack = new Stack<String>();
private Dictionary<decimal, PriceLevel> prices = new Dictionary<decimal, PriceLevel>();
private bool isBids = false;
private decimal? currentPrice = null;
private long instrumentId;
private OnPriceChange _priceChangeCallback;
public void MainClass(OnPriceChange priceChangeCallback) {
this._priceChangeCallback = priceChangeCallback;
}
public void XmlStart(object source, MessageEventArgs args) {
xmlStack.Push(args.Value);
if (!isBids && "bids" == args.Value) {
isBids = true;
}
}
public void XmlEnd(object source, MessageEventArgs args) {
xmlStack.Pop();
if (isBids && "bids" == args.Value) {
isBids = false;
}
// Finished parsing the orderBookEvent
if ("orderBook" == args.Value) {
_priceChangeCallback(instrumentId, prices);
}
}
public void XmlContent(object source, MessageEventArgs args) {
switch (xmlStack.Peek()) {
case "instrumentId":
instrumentId = long.Parse(args.Value);
break;
case "price":
currentPrice = decimal.Parse(args.Value);
break;
case "quantity":
if (currentPrice != null) {
decimal quantity = decimal.Parse(args.Value);
if (prices.ContainsKey(currentPrice)) {
prices[currentPrice] = new PriceLevel();
}
PriceLevel priceLevel = prices[currentPrice];
if (isBids) {
priceLevel.Bid = quantity;
} else {
priceLevel.Offer = quantity;
}
}
break;
}
}
}
XDocument or XmlDocument
If you're using .NET version 3.0 or lower, you have to use XmlDocument
aka the classic DOM API. Likewise you'll find there are some other APIs which will expect this.
If you get the choice, however, I would thoroughly recommend using XDocument
aka LINQ to XML. It's much simpler to create documents and process them. For example, it's the difference between:
XmlDocument doc = new XmlDocument();
XmlElement root = doc.CreateElement("root");
root.SetAttribute("name", "value");
XmlElement child = doc.CreateElement("child");
child.InnerText = "text node";
root.AppendChild(child);
doc.AppendChild(root);
and
XDocument doc = new XDocument(
new XElement("root",
new XAttribute("name", "value"),
new XElement("child", "text node")));
Namespaces are pretty easy to work with in LINQ to XML, unlike any other XML API I've ever seen:
XNamespace ns = "http://somewhere.com";
XElement element = new XElement(ns + "elementName");
// etc
LINQ to XML also works really well with LINQ - its construction model allows you to build elements with sequences of sub-elements really easily:
// Customers is a List<Customer>
XElement customersElement = new XElement("customers",
customers.Select(c => new XElement("customer",
new XAttribute("name", c.Name),
new XAttribute("lastSeen", c.LastOrder)
new XElement("address",
new XAttribute("town", c.Town),
new XAttribute("firstline", c.Address1),
// etc
));
It's all a lot more declarative, which fits in with the general LINQ style.
Now as Brannon mentioned, these are in-memory APIs rather than streaming ones (although XStreamingElement
supports lazy output). XmlReader
and XmlWriter
are the normal ways of streaming XML in .NET, but you can mix all the APIs to some extent. For example, you can stream a large document but use LINQ to XML by positioning an XmlReader
at the start of an element, reading an XElement
from it and processing it, then moving on to the next element etc. There are various blog posts about this technique, here's one I found with a quick search.
Nesting XmlReader
There's no problem opening two readers at the same time. However you cannot reuse XmlDoc2
after disposing it (through the using
block).
XmlReader
is forward-only, so basically you'd be running through XmlDoc2
for each iteration.
If speed is your concern, you could try let XmlDoc1
be an XmlReader
(as you're running through it from top to bottom, once) and use one of the suggested XmlDocument
or XDocument
classes for the inner xml.
Related Topics
Can the C# Interactive Window Interact with My Code
Automatically Update Version Number
: ? Operators Instead of If|Else
Finding All Combinations of Well-Formed Brackets
View Post Request Body in Application Insights
Why Are Const Parameters Not Allowed in C#
How to Find the Actual Printable Area? (Printdocument)
Using a 32Bit or 64Bit Dll in C# Dllimport
Is There a Messagebox Equivalent in Wpf
Deciding on When to Use Xmldocument VS Xmlreader
Most Efficient Method of Self Referencing Tree Using Entity Framework
How to Get the Index of the Highest Value in an Array Using Linq
How to Get the Path of the Current User's "Application Data" Folder