Deciding on When to Use Xmldocument VS Xmlreader

Deciding on when to use XmlDocument vs XmlReader

I've generally looked at it not from a fastest perspective, but rather from a memory utilization perspective. All of the implementations have been fast enough for the usage scenarios I've used them in (typical enterprise integration).

However, where I've fallen down, and sometimes spectacularly, is not taking into account the general size of the XML I'm working with. If you think about it up front you can save yourself some grief.

XML tends to bloat when loaded into memory, at least with a DOM reader like XmlDocument or XPathDocument. Something like 10:1? The exact amount is hard to quantify, but if it's 1MB on disk it will be 10MB in memory, or more, for example.

A process using any reader that loads the whole document into memory in its entirety (XmlDocument/XPathDocument) can suffer from large object heap fragmentation, which can ultimately lead to OutOfMemoryExceptions (even with available memory) resulting in an unavailable service/process.

Since objects that are greater than 85K in size end up on the large object heap, and you've got a 10:1 size explosion with a DOM reader, you can see it doesn't take much before your XML documents are being allocated from the large object heap.

XmlDocument is very easy to use. Its only real drawback is that it loads the whole XML document into memory to process. Its seductively simple to use.

XmlReader is a stream based reader so will keep your process memory utilization generally flatter but is more difficult to use.

XPathDocument tends to be a faster, read-only version of XmlDocument, but still suffers from memory 'bloat'.

XmlDocument VS XmlReader

The question shouldn't be which is faster but which is good for your case.

XmlDocument loads entire document into memory and allows you to modify it and query the content. After all you can save modified document back to file.

XmlReader provides read only and forward only access to the content of XML document, one element at the time.

You have to choose which description fits into your case.

You should also be aware that there is another way to handle XML documents in .NET, called LINQ to XML.

Performance: XmlSerializer vs XmlReader vs XmlDocument vs XDocument

As for the performance of XmlSerializer, see http://msdn.microsoft.com/en-us/library/182eeyhh.aspx which says:

The XmlSerializer creates C# files and
compiles them into .dll files to
perform this serialization. In .NET
Framework 2.0, the XML Serializer
Generator Tool (Sgen.exe) is designed
to generate these serialization
assemblies in advance to be deployed
with your application and improve
startup performance.

So you can increase performance of XmlSerializer by making use of the sgen tool http://msdn.microsoft.com/en-us/library/bk3w6240.aspx, that way you can avoid the performance hit you get when new XmlSerializer() creates and compiles C# files.

XmlTextReader vs. XDocument

If you're happy reading everything into memory, use XDocument. It'll make your life much easier. LINQ to XML is a lovely API.

Use an XmlReader (such as XmlTextReader) if you need to handle huge XML files in a streaming fashion, basically. It's a much more painful API, but it allows streaming (i.e. only dealing with data as you need it, so you can go through a huge document and only have a small amount in memory at a time).

There's a hybrid approach, however - if you have a huge document made up of small elements, you can create an XElement from an XmlReader positioned at the start of the element, deal with the element using LINQ to XML, then move the XmlReader onto the next element and start again.

XML parsing differences in .NET Framework 2.0, 3.0, 3.5 and 4.0

you may have a look at specific answers here:
Deciding on when to use XmlDocument vs XmlReader

As you can see it is not only a question of performance but memory usage as well. see the experts' anwers with runtimes in yellow

What is the fastest/most efficient way to read this XML to Dictionary (Linq or something else?)

When are are using a event based interface, similar to the one presented in your update, you will need to remember the name of the previous start element event. Often it is worth while holding a stack to keep track of the events. I would probably do something similar to the following:

public class PriceLevel
{
private decimal? bid = null;
private decimal? offer = null;

public decimal? Bid {
get { return bid; }
set { bid = value; }
}

public decimal? Offer {
get { return offer; }
set { offer = value; }
}
}

public delegate void OnPriceChange(long instrumentId, Dictionary<decimal, PriceLevel> prices);

public class MainClass
{
private Stack<String> xmlStack = new Stack<String>();
private Dictionary<decimal, PriceLevel> prices = new Dictionary<decimal, PriceLevel>();
private bool isBids = false;
private decimal? currentPrice = null;
private long instrumentId;
private OnPriceChange _priceChangeCallback;

public void MainClass(OnPriceChange priceChangeCallback) {
this._priceChangeCallback = priceChangeCallback;
}

public void XmlStart(object source, MessageEventArgs args) {
xmlStack.Push(args.Value);

if (!isBids && "bids" == args.Value) {
isBids = true;
}
}

public void XmlEnd(object source, MessageEventArgs args) {
xmlStack.Pop();

if (isBids && "bids" == args.Value) {
isBids = false;
}

// Finished parsing the orderBookEvent
if ("orderBook" == args.Value) {
_priceChangeCallback(instrumentId, prices);
}
}

public void XmlContent(object source, MessageEventArgs args) {

switch (xmlStack.Peek()) {
case "instrumentId":
instrumentId = long.Parse(args.Value);
break;

case "price":
currentPrice = decimal.Parse(args.Value);
break;

case "quantity":

if (currentPrice != null) {
decimal quantity = decimal.Parse(args.Value);

if (prices.ContainsKey(currentPrice)) {
prices[currentPrice] = new PriceLevel();
}
PriceLevel priceLevel = prices[currentPrice];

if (isBids) {
priceLevel.Bid = quantity;
} else {
priceLevel.Offer = quantity;
}
}
break;
}
}
}

XDocument or XmlDocument

If you're using .NET version 3.0 or lower, you have to use XmlDocument aka the classic DOM API. Likewise you'll find there are some other APIs which will expect this.

If you get the choice, however, I would thoroughly recommend using XDocument aka LINQ to XML. It's much simpler to create documents and process them. For example, it's the difference between:

XmlDocument doc = new XmlDocument();
XmlElement root = doc.CreateElement("root");
root.SetAttribute("name", "value");
XmlElement child = doc.CreateElement("child");
child.InnerText = "text node";
root.AppendChild(child);
doc.AppendChild(root);

and

XDocument doc = new XDocument(
new XElement("root",
new XAttribute("name", "value"),
new XElement("child", "text node")));

Namespaces are pretty easy to work with in LINQ to XML, unlike any other XML API I've ever seen:

XNamespace ns = "http://somewhere.com";
XElement element = new XElement(ns + "elementName");
// etc

LINQ to XML also works really well with LINQ - its construction model allows you to build elements with sequences of sub-elements really easily:

// Customers is a List<Customer>
XElement customersElement = new XElement("customers",
customers.Select(c => new XElement("customer",
new XAttribute("name", c.Name),
new XAttribute("lastSeen", c.LastOrder)
new XElement("address",
new XAttribute("town", c.Town),
new XAttribute("firstline", c.Address1),
// etc
));

It's all a lot more declarative, which fits in with the general LINQ style.

Now as Brannon mentioned, these are in-memory APIs rather than streaming ones (although XStreamingElement supports lazy output). XmlReader and XmlWriter are the normal ways of streaming XML in .NET, but you can mix all the APIs to some extent. For example, you can stream a large document but use LINQ to XML by positioning an XmlReader at the start of an element, reading an XElement from it and processing it, then moving on to the next element etc. There are various blog posts about this technique, here's one I found with a quick search.

Nesting XmlReader

There's no problem opening two readers at the same time. However you cannot reuse XmlDoc2 after disposing it (through the using block).

XmlReader is forward-only, so basically you'd be running through XmlDoc2 for each iteration.

If speed is your concern, you could try let XmlDoc1 be an XmlReader (as you're running through it from top to bottom, once) and use one of the suggested XmlDocument or XDocument classes for the inner xml.



Related Topics



Leave a reply



Submit