Can Elementtree Be Told to Preserve the Order of Attributes

Can ElementTree be told to preserve the order of attributes?

With help from @bobince's answer and these two (setting attribute order, overriding module methods)

I managed to get this monkey patched it's dirty and I'd suggest using another module that better handles this scenario but when that isn't a possibility:

# =======================================================================
# Monkey patch ElementTree
import xml.etree.ElementTree as ET

def _serialize_xml(write, elem, encoding, qnames, namespaces):
tag = elem.tag
text = elem.text
if tag is ET.Comment:
write("<!--%s-->" % ET._encode(text, encoding))
elif tag is ET.ProcessingInstruction:
write("<?%s?>" % ET._encode(text, encoding))
else:
tag = qnames[tag]
if tag is None:
if text:
write(ET._escape_cdata(text, encoding))
for e in elem:
_serialize_xml(write, e, encoding, qnames, None)
else:
write("<" + tag)
items = elem.items()
if items or namespaces:
if namespaces:
for v, k in sorted(namespaces.items(),
key=lambda x: x[1]): # sort on prefix
if k:
k = ":" + k
write(" xmlns%s=\"%s\"" % (
k.encode(encoding),
ET._escape_attrib(v, encoding)
))
#for k, v in sorted(items): # lexical order
for k, v in items: # Monkey patch
if isinstance(k, ET.QName):
k = k.text
if isinstance(v, ET.QName):
v = qnames[v.text]
else:
v = ET._escape_attrib(v, encoding)
write(" %s=\"%s\"" % (qnames[k], v))
if text or len(elem):
write(">")
if text:
write(ET._escape_cdata(text, encoding))
for e in elem:
_serialize_xml(write, e, encoding, qnames, None)
write("</" + tag + ">")
else:
write(" />")
if elem.tail:
write(ET._escape_cdata(elem.tail, encoding))

ET._serialize_xml = _serialize_xml

from collections import OrderedDict

class OrderedXMLTreeBuilder(ET.XMLTreeBuilder):
def _start_list(self, tag, attrib_in):
fixname = self._fixname
tag = fixname(tag)
attrib = OrderedDict()
if attrib_in:
for i in range(0, len(attrib_in), 2):
attrib[fixname(attrib_in[i])] = self._fixtext(attrib_in[i+1])
return self._target.start(tag, attrib)

# =======================================================================

Then in your code:

tree = ET.parse(pathToFile, OrderedXMLTreeBuilder())

Elementtree setting attribute order

Apply monkey patch as mentioned below::

in ElementTree.py file, there is a function named as _serialize_xml;

in this function; apply the below mentioned patch;

        ##for k, v in sorted(items):  # remove the sorted here
for k, v in items:
if isinstance(k, QName):
k = k.text
if isinstance(v, QName):
v = qnames[v.text]
else:
v = _escape_attrib(v, encoding)
write(" %s=\"%s\"" % (qnames[k], v))

here; remove the sorted(items) and make it just items like i have done above.

Also to disable sorting based on namespace(because in above patch; sorting is still present when namespace is present for xml attribute; otherwise if namespace is not present; then above is working fine); so to do that, replace all {} with collections.OrderedDict() from ElementTree.py

Now you have all attributes in a order as you have added them to that xml element.

Before doing all of above; read the copyright message by Fredrik Lundh that is present in ElementTree.py

Preserve order of attributes when modifying with minidom

Is there a way I can preserve the original order of attributes when processing XML with minidom?

With minidom no, the datatype used to store attributes is an unordered dictionary. pxdom can do it, though it is considerably slower.

python - lxml: enforcing a specific order for attributes

Attribute ordering and readability
As the commenters have mentioned, attribute order has no semantic significance in XML, which is to say it doesn't change the meaning of an element:

<tag attr1="val1" attr2="val2"/>

<!-- means the same thing as: -->

<tag attr2="val2" attr1="val1"/>

There is an analogous characteristic in SQL, where column order doesn't change
the meaning of a table definition. XML attributes and SQL columns are a set
(not an ordered set), and so all that can "officially" be said about either
one of those is whether the attribute or column is present in the set.

That said, it definitely makes a difference to human readability which order
these things appear in and in situations where constructs like this are authored and appear in text (e.g. source code) and must be interpreted, a careful ordering makes a lot of sense to me.

Typical parser behavior

Any XML parser that treated attribute order as significant would be out of compliance with the XML standard. That doesn't mean it can't happen, but in my experience it is certainly unusual. Still, depending on the provenence of the tool you mention, it's a possibility that may be worth testing.

As far as I know, lxml has no mechanism for specifying the order attributes appear in serialized XML, and I would be surprised if it did.

In order to test the behavior I'd be strongly inclined to just write a text-based template to generate enough XML to test it out:

id = 1
name = 'Development Signature'
puid = 'dev/1'
version = '1.0'
mimetype = 'text/x-test-signature'

template = ('<FileFormat ID="%d" Name="%s" PUID="%s" Version="%s" '
'MIMEType="%s">')

xml = template % (id, name, puid, version, mimetype)

Save XML in custom order of attribute in Python

For this specific xml string you could use xmltodict (install with pip install xmltodict). This converts the xml data to a python dictionary. Using the following would reverse the order of the OKV entry:

import xmltodict
import collections
xml_data = '<OKV s="*****" r="*****" a="*****" g="****" m="*****" e="****" d="****" i="****" n="****" v="1"/>'
xml_data_parsed = xmltodict.parse(xml_data)
xml_data_parsed["OKV"] = collections.OrderedDict(reversed(list(xml_data_parsed["OKV"].items())))
xml_data = xmltodict.unparse(xml_data_parsed)

If the the part you want to reorder is embedded in a larger xml document you would have to select the deeper nested key to get to e.g. OKV.



Related Topics



Leave a reply



Submit