Xml Error at Ampersand (&)

XML error at ampersand (&)

& in XML starts an entity. As you haven't defined an entity &WhateverIsAfterThat an error is thrown. You should escape it with &.

$string = str_replace('&', '&', $string);

How do I escape ampersands in XML

To escape the other reserved characters:

function xmlEscape($string) {
return str_replace(array('&', '<', '>', '\'', '"'), array('&', '<', '>', ''', '"'), $string);
}

How do I escape ampersands in XML so they are rendered as entities in HTML?

When your XML contains &amp;, this will result in the text &.

When you use that in HTML, that will be rendered as &.

How can I include an ampersand (&) character in an XML document?

Use a character reference to represent it: &

See the specification:

The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section.

If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings "
&
" and "
<
" respectively. The right angle bracket (>) may be represented using the string "
>
", and MUST, for compatibility, be escaped using either "
>
" or a character reference when it appears in the string "
]]>
" in content, when that string is not marking the end of a CDATA section.

XML ampersand error I have & in my xml file

EDIT This should do:

var xmlescape = (function() {

var setup = {
full : 0, // encode ascii as well, 0|1
names : 0, // use named char. entities, 0|1
};

var rfulltag =

// match opening tag, capture tagname and full tag
'(<([a-z][a-z0-9]*)[^>]*>)' +

// capture everything not followed by a tag
// (anything between matching pair of tags)
'(?!\\s*<[a-z][a-z0-9]*[^>]*>)([\\s\\S]+?)' +

// capture closing tag coresponding to matched open tag
'(<\\/\\2\\s*>)';

rfulltag = new RegExp(rfulltag, 'gi'); // (g)lobal, case(i)nsensitive

var runsafe = /["&\/<>']/g; // reserved xml characters
var rascii = /[\x00-\xFF]/g; // ascii char.

// hex character entities
var unsafemap = {
'"' : '"',
'&' : '&',
'/' : '/',
'<' : '<',
'>' : '>',
'\'': ''',
};

// named character entities
var namedunsafemap = {
'"' : '"',
'&' : '&',
'/' : '/',
'<' : '<',
'>' : '>',
'\'': ''',
};

// use cache lookups to speed things a bit
var cache = {};

var pad02 = function(chr) {
return (Array(3).slice(chr.length).join('0') + chr).toUpperCase();
};

var hexesc = function(chr) {
return '&#x' + pad02(chr.charCodeAt(0).toString(16)) + ';';
};

// encode and cache
var _encodeunsafe = function(chr) {
return cache.hasOwnProperty(chr) ? cache[chr] :
(cache[chr] = (setup.full ? hexesc(chr) : (setup.names ? namedunsafemap[chr] : unsafemap[chr])));
};

var _encoder = function(all, _1, _2, _3, _4) {
return _1 + _3.replace(setup.full ? rascii : runsafe, _encodeunsafe) + _4;
};

return function(sxml) {
return ('' + sxml).replace(rfulltag, _encoder);
};

})();

// xmlescape("<deal> <title> >>> \"<&> invalid b&d </&>' <<< </title> <title> ran & ban </title> </deal>");
// "<deal> <title> >>> "<&> invalid b&d </&>' <<< </title> <title> ran & ban </title> </deal>"

// from goog. closure
var parsexml = (function(g) {

return ('function' == typeof g.DOMParser) ?
// ffox, chrome, etc.
function(xml) {
return new g.DOMParser().parseFromString(xml, 'application/xml');
} :
(
('function' == typeof g.ActiveXObject) ? // ie

(function(msaxo) {

function _msdoc() {

var doc = new msaxo('MSXML2.DOMDocument');

if (doc) {

// Prevent potential vulnerabilities exposed by MSXML2, see
// http://b/1707300 and http://wiki/Main/ISETeamXMLAttacks for details.
doc.resolveExternals = false;
doc.validateOnParse = false;

// Add a try catch block because accessing these properties will throw an
// error on unsupported MSXML versions. This affects Windows machines
// running IE6 or IE7 that are on XP SP2 or earlier without MSXML updates.
// See http://msdn.microsoft.com/en-us/library/ms766391(VS.85).aspx for
// specific details on which MSXML versions support these properties.
try {

doc.setProperty('ProhibitDTD', true);
doc.setProperty('MaxXMLSize', 2 * 1024); // goog.dom.xml.MAX_XML_SIZE_KB
doc.setProperty('MaxElementDepth', 256); // goog.dom.xml.MAX_ELEMENT_DEPTH

} catch (e) {}
}

return doc;
}

return function(xml) {
var doc;
return doc = _msdoc(), doc.loadXML(xml), doc;
};

})(g.ActiveXObject) :

// other
function(xml) {
throw Error(': poor xml support; upgrade');
}
);
})(window);

// use:

var xhttp = new XMLHttpRequest();
// send a request..

var xml = xhttp.responseXML || xhttp.responseText;

if ('string' == typeof xml)
xml = parsexml(xmlescape(xml));

// eof

How can I escape & in XML?

Use & in place of &.

Change it to:

<string name="magazine">Newspaper & Magazines</string>

Escaping ampersand and other special characters in XML using Python

Try this:

import os
from xml.etree import ElementTree
from xml.sax.saxutils import escape

fileNum = 0;
saveFile = open('NewYork_1.txt','w')
saveFile.close()
for path, dirs, files in os.walk("NewYork_1"):
for f in files:
fileName = os.path.join(path, f)
with open(fileName,'a', encoding='utf-8') as myFile:
myFile=myFile.read()
if "&" in myFile:
myFile = myFile.replace("&", "&")

I personally would generate a list of files to read and iterate through that list rather than use os.walk (if you're getting the list from a previous function or a separate script, you could always create a text file with each txt file on a separate line and iterate through lines rather than getting from a variable to save RAM), but to each his own.

As I said, though, I'd discard the whole idea of replacing special characters and use bs4 to open the files, search for which elements you're looking for, and grab from there.

import bs4
list_of_USER_IDs=[]
with open(fileName,'r', encoding='utf-8') as myFile:
a=myFile.read()
b=bs4.beautifulSoup(a)
for elem in b.findAll('USERID'):
list_of_USER_IDs.append(elem)

That returns the data between the USERID tags, but it'll work for whatever tag you want data from. No need to directly parse xml. It's really not much different than HTML and beautifulSoup is made for that, so why reinvent the wheel?



Related Topics



Leave a reply



Submit