Python Elementtree Module: How to Ignore the Namespace of Xml Files to Locate Matching Element When Using the Method "Find", "Findall"

Suppress namespace in ElementTree

OK, thanks for the links to the other question. I've decided to borrow (and improve on) one of the solutions given there:

def stripNs(el):
'''Recursively search this element tree, removing namespaces.'''
if el.tag.startswith("{"):
el.tag = el.tag.split('}', 1)[1] # strip namespace
for k in el.attrib.keys():
if k.startswith("{"):
k2 = k.split('}', 1)[1]
el.attrib[k2] = el.attrib[k]
del el.attrib[k]
for child in el:
stripNs(child)

how to find and edit tags in XML files with namespaces using ElementTree

Instead of stripping out the namespaces, I suggest using namespace wildcards. Support for this was added in Python 3.8.

from xml.etree import ElementTree as ET

tree = ET.parse(adiPath)

rating = tree.find(".//{*}Rating") # Find the Rating element in any namespace
rating.text = "999"

Note that you have to use find() (or findall()). Wildcards do not work with iter().


The following workaround can be used to preserve the original namespace prefixes when serializing the XML document (see also https://stackoverflow.com/a/42372404/407651 and https://stackoverflow.com/a/54491129/407651).

namespaces = dict([elem for _, elem in ET.iterparse("test1.xml", events=['start-ns'])])
for ns in namespaces:
ET.register_namespace(ns, namespaces[ns])

Remove namespaces and nodes from XML string in python

You can simply extract the relevant portion into a new document:

import xml.etree.ElementTree as ET
root = ET.fromstring(dmXML)
new_root = root.find('.//DataMap')
print(ET.tostring(new_root, xml_declaration=True, encoding='UTF-8').decode())

Output:

<?xml version='1.0' encoding='UTF-8'?>
<DataMap sourceType="0">
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1" />
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1" />
</DataMap>

Python: Ignore xmlns in elementtree.ElementTree

You can define a generator to recursively search through your element tree in order to find tags which end with the appropriate tag name. For example, something like this:

def get_element_by_tag(element, tag):
if element.tag.endswith(tag):
yield element
for child in element:
for g in get_element_by_tag(child, tag):
yield g

This just checks for tags which end with tag, i.e. ignoring any leading namespace. You can then iterate over any tag you want as follows:

for item in get_element_by_tag(elemettree, 'technicalContact'):
...

This generator in action:

>>> xml_str = """<root xmlns="http://www.example.com">
... <technicalContact>Test1</technicalContact>
... <technicalContact>Test2</technicalContact>
... </root>
... """

xml_etree = etree.fromstring(xml_str)

>>> for item in get_element_by_tag(xml_etree, 'technicalContact')
... print item.tag, item.text
...
{http://www.example.com}technicalContact Test1
{http://www.example.com}technicalContact Test2

ElementTree namespace dictionary not working with find() or findall()

Remove the curly brackets from the URI string. The namespace dictionary should look like this:

ns = {'ws': 'urn:com.workday/workersync'}

Another option is to use a wildcard for the namespace. This is supported for find() and findall() since Python 3.8:

print(xmlroot.findall('{*}Worker'))

Output:

[<Element '{urn:com.workday/workersync}Worker' at 0x033E6AC8>]


Related Topics



Leave a reply



Submit