How to Deal with Xml in C#

How to deal with XML in C#

The primary means of reading and writing in C# 2.0 is done through the XmlDocument class. You can load most of your settings directly into the XmlDocument through the XmlReader it accepts.

Loading XML Directly

XmlDocument document = new XmlDocument();
document.LoadXml("<People><Person Name='Nick' /><Person Name='Joe' /></People>");

Loading XML From a File

XmlDocument document = new XmlDocument();
document.Load(@"C:\Path\To\xmldoc.xml");
// Or using an XmlReader/XmlTextReader
XmlReader reader = XmlReader.Create(@"C:\Path\To\xmldoc.xml");
document.Load(reader);

I find the easiest/fastest way to read an XML document is by using XPath.

Reading an XML Document using XPath (Using XmlDocument which allows us to edit)

XmlDocument document = new XmlDocument();
document.LoadXml("<People><Person Name='Nick' /><Person Name='Joe' /></People>");

// Select a single node
XmlNode node = document.SelectSingleNode("/People/Person[@Name = 'Nick']");

// Select a list of nodes
XmlNodeList nodes = document.SelectNodes("/People/Person");

If you need to work with XSD documents to validate an XML document you can use this.

Validating XML Documents against XSD Schemas

XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidateType = ValidationType.Schema;
settings.Schemas.Add("", pathToXsd); // targetNamespace, pathToXsd

XmlReader reader = XmlReader.Create(pathToXml, settings);
XmlDocument document = new XmlDocument();

try {
document.Load(reader);
} catch (XmlSchemaValidationException ex) { Trace.WriteLine(ex.Message); }

Validating XML against XSD at each Node (UPDATE 1)

XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidateType = ValidationType.Schema;
settings.Schemas.Add("", pathToXsd); // targetNamespace, pathToXsd
settings.ValidationEventHandler += new ValidationEventHandler(settings_ValidationEventHandler);

XmlReader reader = XmlReader.Create(pathToXml, settings);
while (reader.Read()) { }

private void settings_ValidationEventHandler(object sender, ValidationEventArgs args)
{
// e.Message, e.Severity (warning, error), e.Error
// or you can access the reader if you have access to it
// reader.LineNumber, reader.LinePosition.. etc
}

Writing an XML Document (manually)

XmlWriter writer = XmlWriter.Create(pathToOutput);
writer.WriteStartDocument();
writer.WriteStartElement("People");

writer.WriteStartElement("Person");
writer.WriteAttributeString("Name", "Nick");
writer.WriteEndElement();

writer.WriteStartElement("Person");
writer.WriteStartAttribute("Name");
writer.WriteValue("Nick");
writer.WriteEndAttribute();
writer.WriteEndElement();

writer.WriteEndElement();
writer.WriteEndDocument();

writer.Flush();

(UPDATE 1)

In .NET 3.5, you use XDocument to perform similar tasks. The difference however is you have the advantage of performing Linq Queries to select the exact data you need. With the addition of object initializers you can create a query that even returns objects of your own definition right in the query itself.

    XDocument doc = XDocument.Load(pathToXml);
List<Person> people = (from xnode in doc.Element("People").Elements("Person")
select new Person
{
Name = xnode.Attribute("Name").Value
}).ToList();

(UPDATE 2)

A nice way in .NET 3.5 is to use XDocument to create XML is below. This makes the code appear in a similar pattern to the desired output.

XDocument doc =
new XDocument(
new XDeclaration("1.0", Encoding.UTF8.HeaderName, String.Empty),
new XComment("Xml Document"),
new XElement("catalog",
new XElement("book", new XAttribute("id", "bk001"),
new XElement("title", "Book Title")
)
)
);

creates

<!--Xml Document-->
<catalog>
<book id="bk001">
<title>Book Title</title>
</book>
</catalog>

All else fails, you can check out this MSDN article that has many examples that I've discussed here and more.
http://msdn.microsoft.com/en-us/library/aa468556.aspx

Read XML from c#

XmlSerializer is your friend:

using System;
using System.Collections.Generic;
using System.IO;
using System.Xml.Serialization;

public class ExportJobs
{
public List<Job> JobList { get; } = new List<Job>();
}
public class Job
{
[XmlAttribute]
public int Id { get; set; }
public string Comments { get; set; }
public DateTime DueDate { get; set; }
public string FormattedDueDate { get; set; }
public DateTime TargetDueDate{ get; set; }
public int ServiceTypeId { get; set; }
public string ServiceType { get; set; }
public string TenantName { get; set; }
public string Uprn { get; set; }
public string HouseName { get; set; }
}
static class P
{

static void Main()
{
var ser = new XmlSerializer(typeof(ExportJobs));
ExportJobs jobs;
using (var sr = new StringReader(xml))
{
jobs = (ExportJobs) ser.Deserialize(sr);
}

foreach(var job in jobs.JobList)
{
Console.WriteLine($"{job.Id} / {job.Uprn}: {job.DueDate}");
}
}

const string xml = @"<?xml version=""1.0"" encoding=""utf-8""?>
<ExportJobs xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xmlns:xsd=""http://www.w3.org/2001/XMLSchema"">
<JobList>
<Job Id=""555555"">
<Comments></Comments>
<DueDate>2017-11-17</DueDate>
<FormattedDueDate>17-Nov-2017 12:00</FormattedDueDate>
<TargetDueDate>2017-11-17</TargetDueDate>
<ServiceTypeId>3</ServiceTypeId>
<ServiceType>Service</ServiceType>
<TenantName>Miss Ash</TenantName>
<Uprn>testUpr</Uprn>
<HouseName></HouseName>
</Job>
<Job Id=""666666"">
<Comments></Comments>
<DueDate>2018-03-15</DueDate>
<FormattedDueDate>15-Mar-2018 12:00</FormattedDueDate>
<TargetDueDate>2018-03-15</TargetDueDate>
<ServiceTypeId>3</ServiceTypeId>
<ServiceType>Service</ServiceType>
<TenantName>Mr Howard</TenantName>
<Uprn>testUpr2</Uprn>
</Job>
</JobList>
</ExportJobs>";
}

How to deal with a XML based protocol where the response may conform to one of two XSDs?

Is the union of the two schemas a valid schema? (This will typically be the case, for example, if they use different namespaces, but it's likely not to be the case if they are both no-namespace schemas or if they are two versions of the same schema).

If the union is a valid schema, then you could consider validing against that.

Otherwise peeking at the start of the file will often be enough to tell you which vocabulary is in use.

It's possible to parse an XML document without validation, inspect it, and then validate the already parsed document. It's even possible to do this in a single pipeline without putting the whole document in memory. But the details depend on the toolkit you are using. You've tagged the question C# - I'm not sure if this is possible using the Microsoft tools, but it should be possible I think using Saxon-CS. [Disclaimer, my product].

Reading the XML file in c#

nodeList is a list of <InvoiceHeader> nodes. One method to solve this problem would be to iterate through the ChildNodes of nodeList and use the property Name to create the Details class in each iteration.

Your code was almost there... I've just updated it slightly so it correctly adds the <Number> elements to the list:

List<Details> detailsList = new List<Details>();
XmlDocument doc = new XmlDocument();
doc.Load(path);
XmlNodeList nodeList = doc.SelectNodes("/Invoice/InvoiceHeader");
foreach (XmlNode node in nodeList)
{
// create details class for each InvoiceHeader
Details detail = new Details();
detail.Number = new List<string>();

// loop over child nodes to get Name and all Number elements
foreach (XmlNode child in node.ChildNodes)
{
// check node name to decide how to handle the values
if (child.Name == "Name")
{
detail.Name = child.InnerText;
}
else if (child.Name == "Number")
{
detail.Number.Add(child.InnerText);
}
}
detailsList.Add(detail);
}

Then you can display the results like this:

foreach (var details in detailsList)
{
Console.WriteLine($"{details.Name}: {string.Join(",", details.Number)}");
}

// output
cust1: 5689
cust1: 5689,5459
cust1: 5689,5645,5879

Another method you could consider is Linq-to-Xml. The below code produces the same output as that above:

XDocument doc = XDocument.Load(path);
var details = doc.Descendants("Invoice")
.Elements()
.Select(node => new Details()
{
Name = node.Element("Name").Value,
Number = node.Elements("Number").Select(child => child.Value).ToList()
})
.ToList();

How does one parse XML files?

I'd use LINQ to XML if you're in .NET 3.5 or higher.

How to read this XML in C#?

You could do something like this: get the <ROWDATA> node, then its first (and only) descendant, and from that node, extract the info from the attributes with this code:

var field = xml.Root.Descendants("ROWDATA").Descendants().FirstOrDefault();

if (field != null)
{
string dateJoined = field.Attribute("dateJoined").Value;
decimal total = decimal.Parse(field.Attribute("totals").Value);
decimal partials = decimal.Parse(field.Attribute("partials").Value);

int status = int.Parse(field.Attribute("status").Value);
int counter = int.Parse(field.Attribute("counter").Value);
}

UPDATE: as @KlausGütter has noted in comments - if your .NET culture you're using uses something other than a dot (.) as its decimal separator, then this code won't properly convert the strings representing the decimal values.

To check what the decimal separator is, use this:

CultureInfo.CurrentCulture.Number.CurrencyDecimalSeparator

To properly convert the XML data to decimal in such a case, use:

    decimal total = (decimal)field.Attribute("totals");
decimal partials = (decimal)field.Attribute("partials");

Best way to parse XML in C# with regard to performance

Without seeing your code, it is not easy to optimise it. However, there is one general point you should consider:

Linq-to-XML is a DOM-based parser, in that it reads the entire XML document into a model which resides in memory. All queries are executed against the DOM. For large documents, constructing the DOM can be memory and CPU intensive. Also, your Linq-to-XML queries, if written inefficiently can navigate the same tree nodes multiple times.

As an alternative, consider using a serial (SAX) parser like XmlReader. Parsers of this type do not create a memory-based model of your document, and operate in a forward-only manner, forcing you to read each element just once.

Dealing with awkward XML layout in c# using XmlTextReader

I think you will find Linq To Xml easier to use

var xDoc = XDocument.Parse(xmlstring); //or XDocument.Load(filename);

int sku = (int)xDoc.Root.Element("sku");
string name = (string)xDoc.Root.Element("product_name");
string supplier = (string)xDoc.Root.Element("supplier_number");

You can also convert your xml to dictionary

var dict = xDoc.Root.Elements()
.ToDictionary(e => e.Name.LocalName, e => (string)e);

Console.WriteLine(dict["sku"]);

Reading XML with an & into C# XMLDocument Object

The problem is the xml is not well-formed. Properly generated xml would list the data like this:

Prepaid & Charge

I've fixed the same problem before, and I did it with this regex:

Regex badAmpersand = new Regex("&(?![a-zA-Z]{2,6};|#[0-9]{2,4};)");

Combine that with a string constant defined like this:

const string goodAmpersand = "&";

Now you can say badAmpersand.Replace(<your input>, goodAmpersand);

Note a simple String.Replace("&", "&") isn't good enough, since you can't know in advance for a given document whether any & characters will be coded correctly, incorrectly, or even both in the same document.

The catches here are you have to do this to your xml document before loading it into your parser, which likely means an extra pass through the document. Also, it does not account for ampersands inside of a CDATA section. Finally, it only catches ampersands, not other illegal characters like <. Update: based on the comment, I need to update the expression for hex-coded (&#x...;) entities as well.

Regarding which characters can cause problems, the actual rules are a little complex. For example, certain characters are allowed in data, but not as the first letter of an element name. And there's no simple list of illegal characters. Instead, large (non-contiguous) swaths of UNICODE are defined as legal, and anything outside that is illegal.

When it comes down to it, you have to trust your document source to have at least a certain amount of compliance and consistency. For example, I've found people are often smart enough to make sure the tags work properly and escape <, even if they don't know that & isn't allowed, hence your problem today. However, the best thing would be to get this fixed at the source.

Oh, and a note about the CDATA suggestion: I use that to make sure xml I'm creating is well-formed, but when dealing with existing xml from outside, I find the regex method easier.



Related Topics



Leave a reply



Submit