Remove the Escape Sequence '\' from String to Convert It to Xmldocument

Escape invalid XML characters in C#

As the way to remove invalid XML characters I suggest you to use XmlConvert.IsXmlChar method. It was added since .NET Framework 4 and is presented in Silverlight too. Here is the small sample:

void Main() {
string content = "\v\f\0";
Console.WriteLine(IsValidXmlString(content)); // False

content = RemoveInvalidXmlChars(content);
Console.WriteLine(IsValidXmlString(content)); // True
}

static string RemoveInvalidXmlChars(string text) {
var validXmlChars = text.Where(ch => XmlConvert.IsXmlChar(ch)).ToArray();
return new string(validXmlChars);
}

static bool IsValidXmlString(string text) {
try {
XmlConvert.VerifyXmlChars(text);
return true;
} catch {
return false;
}
}

And as the way to escape invalid XML characters I suggest you to use XmlConvert.EncodeName method. Here is the small sample:

void Main() {
const string content = "\v\f\0";
Console.WriteLine(IsValidXmlString(content)); // False

string encoded = XmlConvert.EncodeName(content);
Console.WriteLine(IsValidXmlString(encoded)); // True

string decoded = XmlConvert.DecodeName(encoded);
Console.WriteLine(content == decoded); // True
}

static bool IsValidXmlString(string text) {
try {
XmlConvert.VerifyXmlChars(text);
return true;
} catch {
return false;
}
}

Update:
It should be mentioned that the encoding operation produces a string with a length which is greater or equal than a length of a source string. It might be important when you store a encoded string in a database in a string column with length limitation and validate source string length in your app to fit data column limitation.

What characters do I need to escape in XML documents?

If you use an appropriate class or library, they will do the escaping for you. Many XML issues are caused by string concatenation.

XML escape characters

There are only five:

"   "
' '
< <
> >
& &

Escaping characters depends on where the special character is used.

The examples can be validated at the W3C Markup Validation Service.

Text

The safe way is to escape all five characters in text. However, the three characters ", ' and > needn't be escaped in text:

<?xml version="1.0"?>
<valid>"'></valid>
Attributes

The safe way is to escape all five characters in attributes. However, the > character needn't be escaped in attributes:

<?xml version="1.0"?>
<valid attribute=">"/>

The ' character needn't be escaped in attributes if the quotes are ":

<?xml version="1.0"?>
<valid attribute="'"/>

Likewise, the " needn't be escaped in attributes if the quotes are ':

<?xml version="1.0"?>
<valid attribute='"'/>
Comments

All five special characters must not be escaped in comments:

<?xml version="1.0"?>
<valid>
<!-- "'<>& -->
</valid>
CDATA

All five special characters must not be escaped in CDATA sections:

<?xml version="1.0"?>
<valid>
<![CDATA["'<>&]]>
</valid>
Processing instructions

All five special characters must not be escaped in XML processing instructions:

<?xml version="1.0"?>
<?process <"'&> ?>
<valid/>
XML vs. HTML

HTML has its own set of escape codes which cover a lot more characters.

String escape into XML

public static string XmlEscape(string unescaped)
{
XmlDocument doc = new XmlDocument();
XmlNode node = doc.CreateElement("root");
node.InnerText = unescaped;
return node.InnerXml;
}

public static string XmlUnescape(string escaped)
{
XmlDocument doc = new XmlDocument();
XmlNode node = doc.CreateElement("root");
node.InnerXml = escaped;
return node.InnerText;
}

Escaping strings for use in XML

Do you mean you do something like this:

from xml.dom.minidom import Text, Element

t = Text()
e = Element('p')

t.data = '<bar><a/><baz spam="eggs"> & blabla &entity;</>'
e.appendChild(t)

Then you will get nicely escaped XML string:

>>> e.toxml()
'<p><bar><a/><baz spam="eggs"> & blabla &entity;</></p>'

Load XML String having escape characters

You should try this:

XmlWriterSettings settings = new XmlWriterSettings();
settings.OmitXmlDeclaration = true;
settings.ConformanceLevel = ConformanceLevel.Document;
settings.CloseOutput = false;

MemoryStream strm = new MemoryStream();

using (XmlWriter writer = XmlWriter.Create(strm, settings))
{
writer.WriteStartElement("media");
writer.WriteStartElement("cd");
writer.WriteStartElement("burned");
writer.WriteAttributeString("value", "true");
writer.WriteEndElement();
writer.WriteEndElement();
writer.WriteStartElement("vinyl");
writer.WriteStartElement("pressed");
writer.WriteAttributeString("value", "true");
writer.WriteEndElement();
writer.WriteEndElement();
writer.WriteEndElement();
}

string sMediaXML = Encoding.UTF8.GetString((strm).ToArray());
Boolean bNodeExists;
string _byteOrderMarkUtf8 = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
if (sMediaXML.StartsWith(_byteOrderMarkUtf8))
{
sMediaXML = sMediaXML.Remove(0, _byteOrderMarkUtf8.Length);
}
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(sMediaXML);

if (xmlDoc.SelectSingleNode("/media/cd/burned/@value").Value != null)
{
bNodeExists = true;
}
else
{
bNodeExists = false;
}
  1. If you have an XML string that you want to load into a XDocument you should use LoadXml method not Load. The Load method is used when loading directly from disk or streams.
  2. The XDocument can not parse the xml string because it contains the UTF-8 byte specifying the order. More information here. There is an other option for this to work, to view it, check this SO question.
  3. The XPath query you have won't work anyway because you don't have any "digital" elements defined.

Dealing with invalid XML hexadecimal characters

byte[] toEncodeAsBytes
= System.Text.ASCIIEncoding.ASCII.GetBytes(toEncode);
string returnValue
= System.Convert.ToBase64String(toEncodeAsBytes);

is one way of doing this

What are invalid characters in XML

The only illegal characters are &, < and > (as well as " or ' in attributes, depending on which character is used to delimit the attribute value: attr="must use " here, ' is allowed" and attr='must use ' here, " is allowed').

They're escaped using XML entities, in this case you want & for &.

Really, though, you should use a tool or library that writes XML for you and abstracts this kind of thing away for you so you don't have to worry about it.

string escape into XML-Attribute

Modifying the solution you referenced, how about

public static string XmlEscape(string unescaped)
{
XmlDocument doc = new XmlDocument();
var node = doc.CreateAttribute("foo");
node.InnerText = unescaped;
return node.InnerXml;
}

All I did was change CreateElement() to CreateAttribute().
The attribute node type does have InnerText and InnerXml properties.

I don't have the environment to test this in, but I'd be curious to know if it works.

Update: Or more simply, use SecurityElement.Escape() as suggested in another answer to the question you linked to. This will escape quotation marks, so it's suitable for using for attribute text.

Update 2: Please note that carriage returns and line feeds do not need to be escaped in an attribute value, in order for the XML to be well-formed. If you want them to be escaped for other reasons, you can do it using String.replace(), e.g.

SecurityElement.Escape(unescaped).Replace("\r", "
").Replace("\n", "
");

or

return node.InnerXml.Replace("\r", "
").Replace("\n", "
");


Related Topics



Leave a reply



Submit