Escape invalid XML characters in C#
As the way to remove invalid XML characters I suggest you to use XmlConvert.IsXmlChar method. It was added since .NET Framework 4 and is presented in Silverlight too. Here is the small sample:
void Main() {
string content = "\v\f\0";
Console.WriteLine(IsValidXmlString(content)); // False
content = RemoveInvalidXmlChars(content);
Console.WriteLine(IsValidXmlString(content)); // True
}
static string RemoveInvalidXmlChars(string text) {
var validXmlChars = text.Where(ch => XmlConvert.IsXmlChar(ch)).ToArray();
return new string(validXmlChars);
}
static bool IsValidXmlString(string text) {
try {
XmlConvert.VerifyXmlChars(text);
return true;
} catch {
return false;
}
}
And as the way to escape invalid XML characters I suggest you to use XmlConvert.EncodeName method. Here is the small sample:
void Main() {
const string content = "\v\f\0";
Console.WriteLine(IsValidXmlString(content)); // False
string encoded = XmlConvert.EncodeName(content);
Console.WriteLine(IsValidXmlString(encoded)); // True
string decoded = XmlConvert.DecodeName(encoded);
Console.WriteLine(content == decoded); // True
}
static bool IsValidXmlString(string text) {
try {
XmlConvert.VerifyXmlChars(text);
return true;
} catch {
return false;
}
}
Update:
It should be mentioned that the encoding operation produces a string with a length which is greater or equal than a length of a source string. It might be important when you store a encoded string in a database in a string column with length limitation and validate source string length in your app to fit data column limitation.
What characters do I need to escape in XML documents?
If you use an appropriate class or library, they will do the escaping for you. Many XML issues are caused by string concatenation.
XML escape charactersThere are only five:
" "
' '
< <
> >
& &
Escaping characters depends on where the special character is used.
The examples can be validated at the W3C Markup Validation Service.
TextThe safe way is to escape all five characters in text. However, the three characters "
, '
and >
needn't be escaped in text:
<?xml version="1.0"?>
<valid>"'></valid>
AttributesThe safe way is to escape all five characters in attributes. However, the >
character needn't be escaped in attributes:
<?xml version="1.0"?>
<valid attribute=">"/>
The '
character needn't be escaped in attributes if the quotes are "
:
<?xml version="1.0"?>
<valid attribute="'"/>
Likewise, the "
needn't be escaped in attributes if the quotes are '
:
<?xml version="1.0"?>
<valid attribute='"'/>
CommentsAll five special characters must not be escaped in comments:
<?xml version="1.0"?>
<valid>
<!-- "'<>& -->
</valid>
CDATAAll five special characters must not be escaped in CDATA sections:
<?xml version="1.0"?>
<valid>
<![CDATA["'<>&]]>
</valid>
Processing instructionsAll five special characters must not be escaped in XML processing instructions:
<?xml version="1.0"?>
<?process <"'&> ?>
<valid/>
XML vs. HTMLHTML has its own set of escape codes which cover a lot more characters.
String escape into XML
public static string XmlEscape(string unescaped)
{
XmlDocument doc = new XmlDocument();
XmlNode node = doc.CreateElement("root");
node.InnerText = unescaped;
return node.InnerXml;
}
public static string XmlUnescape(string escaped)
{
XmlDocument doc = new XmlDocument();
XmlNode node = doc.CreateElement("root");
node.InnerXml = escaped;
return node.InnerText;
}
Escaping strings for use in XML
Do you mean you do something like this:
from xml.dom.minidom import Text, Element
t = Text()
e = Element('p')
t.data = '<bar><a/><baz spam="eggs"> & blabla &entity;</>'
e.appendChild(t)
Then you will get nicely escaped XML string:
>>> e.toxml()
'<p><bar><a/><baz spam="eggs"> & blabla &entity;</></p>'
Load XML String having escape characters
You should try this:
XmlWriterSettings settings = new XmlWriterSettings();
settings.OmitXmlDeclaration = true;
settings.ConformanceLevel = ConformanceLevel.Document;
settings.CloseOutput = false;
MemoryStream strm = new MemoryStream();
using (XmlWriter writer = XmlWriter.Create(strm, settings))
{
writer.WriteStartElement("media");
writer.WriteStartElement("cd");
writer.WriteStartElement("burned");
writer.WriteAttributeString("value", "true");
writer.WriteEndElement();
writer.WriteEndElement();
writer.WriteStartElement("vinyl");
writer.WriteStartElement("pressed");
writer.WriteAttributeString("value", "true");
writer.WriteEndElement();
writer.WriteEndElement();
writer.WriteEndElement();
}
string sMediaXML = Encoding.UTF8.GetString((strm).ToArray());
Boolean bNodeExists;
string _byteOrderMarkUtf8 = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
if (sMediaXML.StartsWith(_byteOrderMarkUtf8))
{
sMediaXML = sMediaXML.Remove(0, _byteOrderMarkUtf8.Length);
}
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(sMediaXML);
if (xmlDoc.SelectSingleNode("/media/cd/burned/@value").Value != null)
{
bNodeExists = true;
}
else
{
bNodeExists = false;
}
- If you have an XML string that you want to load into a
XDocument
you should useLoadXml
method notLoad
. TheLoad
method is used when loading directly from disk or streams. - The
XDocument
can not parse the xml string because it contains the UTF-8 byte specifying the order. More information here. There is an other option for this to work, to view it, check this SO question. - The
XPath
query you have won't work anyway because you don't have any "digital" elements defined.
Dealing with invalid XML hexadecimal characters
byte[] toEncodeAsBytes
= System.Text.ASCIIEncoding.ASCII.GetBytes(toEncode);
string returnValue
= System.Convert.ToBase64String(toEncodeAsBytes);
is one way of doing this
What are invalid characters in XML
The only illegal characters are &
, <
and >
(as well as "
or '
in attributes, depending on which character is used to delimit the attribute value: attr="must use " here, ' is allowed"
and attr='must use ' here, " is allowed'
).
They're escaped using XML entities, in this case you want &
for &
.
Really, though, you should use a tool or library that writes XML for you and abstracts this kind of thing away for you so you don't have to worry about it.
string escape into XML-Attribute
Modifying the solution you referenced, how about
public static string XmlEscape(string unescaped)
{
XmlDocument doc = new XmlDocument();
var node = doc.CreateAttribute("foo");
node.InnerText = unescaped;
return node.InnerXml;
}
All I did was change CreateElement() to CreateAttribute().
The attribute node type does have InnerText and InnerXml properties.
I don't have the environment to test this in, but I'd be curious to know if it works.
Update: Or more simply, use SecurityElement.Escape() as suggested in another answer to the question you linked to. This will escape quotation marks, so it's suitable for using for attribute text.
Update 2: Please note that carriage returns and line feeds do not need to be escaped in an attribute value, in order for the XML to be well-formed. If you want them to be escaped for other reasons, you can do it using String.replace(), e.g.
SecurityElement.Escape(unescaped).Replace("\r", "
").Replace("\n", "
");
or
return node.InnerXml.Replace("\r", "
").Replace("\n", "
");
Related Topics
Using Enum and | as Dictionary Key
Controlling Datetime Parameter Formatting in Webapi 2
Fastest Way to Find Strings in a File
How to Pass Parameter from @Url.Action to Controller Function
Most Efficient Way to Compare Two Ienumerables (Or Lists) in Linq
Using Linq to Remove Elements from a List<T>
Build Query String for System.Net.Httpclient Get
Empty String Not Being Converted to Null When Passing Json Object to Controller
Create a C# Method to Generate Auto Increment Id
How to Calculate the Average of Each Row in Multidimensional Array
How to Iterate Through the Following Json Using C#
C# Replace Item in List<String> "/"
Declare Properties to Ignore in Entities Interface (Ef Core)
How to Format a Number in C# With Commas and Decimals