What characters do I need to escape in XML documents?
If you use an appropriate class or library, they will do the escaping for you. Many XML issues are caused by string concatenation.
XML escape characters
There are only five:
" "
' '
< <
> >
& &
Escaping characters depends on where the special character is used.
The examples can be validated at the W3C Markup Validation Service.
Text
The safe way is to escape all five characters in text. However, the three characters "
, '
and >
needn't be escaped in text:
<?xml version="1.0"?>
<valid>"'></valid>
Attributes
The safe way is to escape all five characters in attributes. However, the >
character needn't be escaped in attributes:
<?xml version="1.0"?>
<valid attribute=">"/>
The '
character needn't be escaped in attributes if the quotes are "
:
<?xml version="1.0"?>
<valid attribute="'"/>
Likewise, the "
needn't be escaped in attributes if the quotes are '
:
<?xml version="1.0"?>
<valid attribute='"'/>
Comments
All five special characters must not be escaped in comments:
<?xml version="1.0"?>
<valid>
<!-- "'<>& -->
</valid>
CDATA
All five special characters must not be escaped in CDATA sections:
<?xml version="1.0"?>
<valid>
<![CDATA["'<>&]]>
</valid>
Processing instructions
All five special characters must not be escaped in XML processing instructions:
<?xml version="1.0"?>
<?process <"'&> ?>
<valid/>
XML vs. HTML
HTML has its own set of escape codes which cover a lot more characters.
How can I escape & in XML?
Use &
in place of &
.
Change it to:
<string name="magazine">Newspaper & Magazines</string>
String escape into XML
public static string XmlEscape(string unescaped)
{
XmlDocument doc = new XmlDocument();
XmlNode node = doc.CreateElement("root");
node.InnerText = unescaped;
return node.InnerXml;
}
public static string XmlUnescape(string escaped)
{
XmlDocument doc = new XmlDocument();
XmlNode node = doc.CreateElement("root");
node.InnerXml = escaped;
return node.InnerText;
}
How to escape the and ≤ symbols in xml?
You will have to escape those symbols as following:
<
will be<
>
will be>
The ≤
sign doesn't need escaping.
Escaping strings for use in XML
Do you mean you do something like this:
from xml.dom.minidom import Text, Element
t = Text()
e = Element('p')
t.data = '<bar><a/><baz spam="eggs"> & blabla &entity;</>'
e.appendChild(t)
Then you will get nicely escaped XML string:
>>> e.toxml()
'<p><bar><a/><baz spam="eggs"> & blabla &entity;</></p>'
string escape into XML-Attribute
Modifying the solution you referenced, how about
public static string XmlEscape(string unescaped)
{
XmlDocument doc = new XmlDocument();
var node = doc.CreateAttribute("foo");
node.InnerText = unescaped;
return node.InnerXml;
}
All I did was change CreateElement() to CreateAttribute().
The attribute node type does have InnerText and InnerXml properties.
I don't have the environment to test this in, but I'd be curious to know if it works.
Update: Or more simply, use SecurityElement.Escape() as suggested in another answer to the question you linked to. This will escape quotation marks, so it's suitable for using for attribute text.
Update 2: Please note that carriage returns and line feeds do not need to be escaped in an attribute value, in order for the XML to be well-formed. If you want them to be escaped for other reasons, you can do it using String.replace(), e.g.
SecurityElement.Escape(unescaped).Replace("\r", "
").Replace("\n", "
");
or
return node.InnerXml.Replace("\r", "
").Replace("\n", "
");
Escape xml characters within nodes of string xml in java
You could use regular expression matching to find all the strings between angled brackets, and loop through/process each of those. In this example I've used the Apache Commons Lang to do the XML escaping.
public String sanitiseXml(String xml)
{
// Match the pattern <something>text</something>
Pattern xmlCleanerPattern = Pattern.compile("(<[^/<>]*>)([^<>]*)(</[^<>]*>)");
StringBuilder xmlStringBuilder = new StringBuilder();
Matcher matcher = xmlCleanerPattern.matcher(xml);
int lastEnd = 0;
while (matcher.find())
{
// Include any non-matching text between this result and the previous result
if (matcher.start() > lastEnd) {
xmlStringBuilder.append(xml.substring(lastEnd, matcher.start()));
}
lastEnd = matcher.end();
// Sanitise the characters inside the tags and append the sanitised version
String cleanText = StringEscapeUtils.escapeXml10(matcher.group(2));
xmlStringBuilder.append(matcher.group(1)).append(cleanText).append(matcher.group(3));
}
// Include any leftover text after the last result
xmlStringBuilder.append(xml.substring(lastEnd));
return xmlStringBuilder.toString();
}
This looks for matches of <something>text</something>, captures the tag names and contained text, sanitises the contained text, and then puts it back together.
Escaping string to use in xml tags
As you anticipated, there is no existing way in Python to map from the characters allowed in a filename (for whatever OS) to characters allowed in an XML element name. To be able to do so reversibly would be additionally challenging.
As you also acknowledge, the XML design is unconventional and problematic, for reasons that only begin with the trouble you're currently having regarding allowed characters.
Recommendations, best first:
Fix the problematic design, even if this means fixing upstream and downstream dependencies.
Pre- and/or post-process to map filenames to legal XML element names.
Design and implement the sort of reversible name mapping scheme you have in mind. The level of effort here, combined with the regrettable perpetuation of previous design mistakes, makes this approach unattractive.
See also
- Allowed symbols in XML element name
Java escape XML token strings
There is no string escaping mechanism for the XML element tag. Some APIs will even reject the name for the new element when it doesn't match the specification for element names. There are at least two possible solutions to your problem:
You can define your own escape mechanism which you use to encode and decode the element name. As an example you could use
_
as the escape sequence. The sequence__
(two underscores) will be a literal_
and the sequence_XX
or_uXXXX
will be the ascii/unicode character you want to write.You save the column name in an attribute. This way you can save every value in it and even use the XML API of your choice to save the value with the proper encoding.
Escape double quote character in XML
Try this:
"
Related Topics
C# - Code to Order by a Property Using the Property Name as a String
How to Make Smtp Authenticated in C#
How to Set HTML to Clipboard in C#
Firefox Browser Does Not Reload the Update CSS/Js Files
How to Convert a C# List<String[]> to a JavaScript Array
Extending an Enum via Inheritance
How Does MVC 4 List Model Binding Work
Auto-Implemented Getters and Setters VS. Public Fields
Best Way to Determine If Two Path Reference to Same File in C#
How to Read User Input from the Console
Best Way to Invoke Any Cross-Threaded Code
Programmatically Mouse Click in Another Window
Rowspan Does Not Work in Itextsharp
Send Push to Android by C# Using Fcm (Firebase Cloud Messaging)
How to Flatten an Expandoobject Returned via JSONresult in ASP.NET MVC
Difference Between Namespace in C# and Package in Java