Can ElementTree be told to preserve the order of attributes?
With help from @bobince's answer and these two (setting attribute order, overriding module methods)
I managed to get this monkey patched it's dirty and I'd suggest using another module that better handles this scenario but when that isn't a possibility:
# =======================================================================
# Monkey patch ElementTree
import xml.etree.ElementTree as ET
def _serialize_xml(write, elem, encoding, qnames, namespaces):
tag = elem.tag
text = elem.text
if tag is ET.Comment:
write("<!--%s-->" % ET._encode(text, encoding))
elif tag is ET.ProcessingInstruction:
write("<?%s?>" % ET._encode(text, encoding))
else:
tag = qnames[tag]
if tag is None:
if text:
write(ET._escape_cdata(text, encoding))
for e in elem:
_serialize_xml(write, e, encoding, qnames, None)
else:
write("<" + tag)
items = elem.items()
if items or namespaces:
if namespaces:
for v, k in sorted(namespaces.items(),
key=lambda x: x[1]): # sort on prefix
if k:
k = ":" + k
write(" xmlns%s=\"%s\"" % (
k.encode(encoding),
ET._escape_attrib(v, encoding)
))
#for k, v in sorted(items): # lexical order
for k, v in items: # Monkey patch
if isinstance(k, ET.QName):
k = k.text
if isinstance(v, ET.QName):
v = qnames[v.text]
else:
v = ET._escape_attrib(v, encoding)
write(" %s=\"%s\"" % (qnames[k], v))
if text or len(elem):
write(">")
if text:
write(ET._escape_cdata(text, encoding))
for e in elem:
_serialize_xml(write, e, encoding, qnames, None)
write("</" + tag + ">")
else:
write(" />")
if elem.tail:
write(ET._escape_cdata(elem.tail, encoding))
ET._serialize_xml = _serialize_xml
from collections import OrderedDict
class OrderedXMLTreeBuilder(ET.XMLTreeBuilder):
def _start_list(self, tag, attrib_in):
fixname = self._fixname
tag = fixname(tag)
attrib = OrderedDict()
if attrib_in:
for i in range(0, len(attrib_in), 2):
attrib[fixname(attrib_in[i])] = self._fixtext(attrib_in[i+1])
return self._target.start(tag, attrib)
# =======================================================================
Then in your code:
tree = ET.parse(pathToFile, OrderedXMLTreeBuilder())
Elementtree setting attribute order
Apply monkey patch as mentioned below::
in ElementTree.py
file, there is a function named as _serialize_xml
;
in this function; apply the below mentioned patch;
##for k, v in sorted(items): # remove the sorted here
for k, v in items:
if isinstance(k, QName):
k = k.text
if isinstance(v, QName):
v = qnames[v.text]
else:
v = _escape_attrib(v, encoding)
write(" %s=\"%s\"" % (qnames[k], v))
here; remove the sorted(items)
and make it just items
like i have done above.
Also to disable sorting based on namespace(because in above patch; sorting is still present when namespace is present for xml attribute; otherwise if namespace is not present; then above is working fine); so to do that, replace all {}
with collections.OrderedDict()
from ElementTree.py
Now you have all attributes in a order as you have added them to that xml element.
Before doing all of above; read the copyright message by Fredrik Lundh that is present in ElementTree.py
Preserve order of attributes when modifying with minidom
Is there a way I can preserve the original order of attributes when processing XML with minidom?
With minidom no, the datatype used to store attributes is an unordered dictionary. pxdom can do it, though it is considerably slower.
python - lxml: enforcing a specific order for attributes
Attribute ordering and readability
As the commenters have mentioned, attribute order has no semantic significance in XML, which is to say it doesn't change the meaning of an element:
<tag attr1="val1" attr2="val2"/>
<!-- means the same thing as: -->
<tag attr2="val2" attr1="val1"/>
There is an analogous characteristic in SQL, where column order doesn't change
the meaning of a table definition. XML attributes and SQL columns are a set
(not an ordered set), and so all that can "officially" be said about either
one of those is whether the attribute or column is present in the set.
That said, it definitely makes a difference to human readability which order
these things appear in and in situations where constructs like this are authored and appear in text (e.g. source code) and must be interpreted, a careful ordering makes a lot of sense to me.
Typical parser behavior
Any XML parser that treated attribute order as significant would be out of compliance with the XML standard. That doesn't mean it can't happen, but in my experience it is certainly unusual. Still, depending on the provenence of the tool you mention, it's a possibility that may be worth testing.
As far as I know, lxml
has no mechanism for specifying the order attributes appear in serialized XML, and I would be surprised if it did.
In order to test the behavior I'd be strongly inclined to just write a text-based template to generate enough XML to test it out:
id = 1
name = 'Development Signature'
puid = 'dev/1'
version = '1.0'
mimetype = 'text/x-test-signature'
template = ('<FileFormat ID="%d" Name="%s" PUID="%s" Version="%s" '
'MIMEType="%s">')
xml = template % (id, name, puid, version, mimetype)
Save XML in custom order of attribute in Python
For this specific xml string you could use xmltodict (install with pip install xmltodict
). This converts the xml data to a python dictionary. Using the following would reverse the order of the OKV
entry:
import xmltodict
import collections
xml_data = '<OKV s="*****" r="*****" a="*****" g="****" m="*****" e="****" d="****" i="****" n="****" v="1"/>'
xml_data_parsed = xmltodict.parse(xml_data)
xml_data_parsed["OKV"] = collections.OrderedDict(reversed(list(xml_data_parsed["OKV"].items())))
xml_data = xmltodict.unparse(xml_data_parsed)
If the the part you want to reorder is embedded in a larger xml document you would have to select the deeper nested key to get to e.g. OKV
.
Related Topics
Python - Using Pandas Structures with Large CSV(Iterate and Chunksize)
Python Postgres Psycopg2 Threadedconnectionpool Exhausted
Python List Directory, Subdirectory, and Files
Find P-Value (Significance) in Scikit-Learn Linearregression
Getting Individual Colors from a Color Map in Matplotlib
Multiple Models in a Single Django Modelform
Removing a List of Characters in String
Correct Way to Implement a Custom Popup Tkinter Dialog Box
Type Hint for a Function That Returns Only a Specific Set of Values
Best Way to Check Function Arguments
Python - Rolling Functions for Groupby Object
Unnamed Python Objects Have the Same Id
How to Apply Gradient Clipping in Tensorflow
How to Extract an Arbitrary Line of Values from a Numpy Array