Parsing Xml With Namespace in Python Via 'Elementtree'

Parsing XML with namespace in Python via 'ElementTree'

You need to give the .find(), findall() and iterfind() methods an explicit namespace dictionary:

namespaces = {'owl': 'http://www.w3.org/2002/07/owl#'} # add more as needed

root.findall('owl:Class', namespaces)

Prefixes are only looked up in the namespaces parameter you pass in. This means you can use any namespace prefix you like; the API splits off the owl: part, looks up the corresponding namespace URL in the namespaces dictionary, then changes the search to look for the XPath expression {http://www.w3.org/2002/07/owl}Class instead. You can use the same syntax yourself too of course:

root.findall('{http://www.w3.org/2002/07/owl#}Class')

Also see the Parsing XML with Namespaces section of the ElementTree documentation.

If you can switch to the lxml library things are better; that library supports the same ElementTree API, but collects namespaces for you in .nsmap attribute on elements and generally has superior namespaces support.

How to preserve namespaces when parsing xml via ElementTree in Python

ElementTree will replace those namespaces' prefixes that are not registered with ET.register_namespace. To preserve a namespace prefix, you need to register it first before writing your modifications on a file. The following method does the job and registers all namespaces globally,

def register_all_namespaces(filename):
namespaces = dict([node for _, node in ET.iterparse(filename, events=['start-ns'])])
for ns in namespaces:
ET.register_namespace(ns, namespaces[ns])

This method should be called before ET.parse method, so that the namespaces will remain as unchanged,

import xml.etree.ElementTree as ET
register_all_namespaces('filename.xml')
tree = ET.parse('filename.xml')
# XML modification here
# save the modifications
tree.write('filename.xml')

How to generate an XML file via Python's ElementTree with registered namespaces written in output

To get a properly prefixed name, try using QName().

To write the XML with the XML declaration, try using xml_declaration=True in the ElementTree.write().

Example...

Python

import xml.etree.cElementTree as ET

ns = {"xs": "http://www.w3.org/2001/XMLSchema"}

ET.register_namespace('xs', ns["xs"])
root = ET.Element(ET.QName(ns["xs"], "House"))
ET.SubElement(root, ET.QName(ns["xs"], "Room"))

ET.ElementTree(root).write("output.xml", xml_declaration=True, encoding="utf-8")

XML Output

<?xml version='1.0' encoding='utf-8'?>
<xs:House xmlns:xs="http://www.w3.org/2001/XMLSchema"><xs:Room /></xs:House>

Note: You don't have to use the ns dictionary. I just used it so I didn't have the full namespace uri everywhere.

Working with namespace while parsing XML using ElementTree

There are two problems on this line:

a=tree.find('parent')          

First, <parent> is not an immediate child of the root element. <parent> is a grandchild of the root element. The path to parent looks like /project/grandparent/parent. To search for <parent>, try the XPath expression */parent or possiblly //parent.

Second, <parent> exists in the default namespace, so you won't be able to .find() it with just its simple name. You'll need to add the namespace.

Here are two equally valid calls to tree.find(), each of which should find the <parent> node:

a=tree.find('*/{http://maven.apache.org/POM/4.0.0}parent')
a=tree.find('*/xmlns:parent', namespaces=spaces)

Next, the call to findall() needs a namespace qualifier:

for b in a.findall('xmlns:child', namespaces=spaces) 

Fourth, the call to create the new child element needs a namespace qualifier. There may be a way to use the shortcut name, but I couldn't find it. I had to use the long form of the name.

ET.SubElement(a,'{http://maven.apache.org/POM/4.0.0}child').text="Jay/Doctor"

Finally, your XML output will look ugly unless you provide a default namespace:

tree.write('test.xml', default_namespace=spaces['xmlns'])

Unrelated to the XML aspects, you copied my answer from the previous question incorrectly. The else lines up with the for, not with the if:

for ...
if ...
else ...

how to find and edit tags in XML files with namespaces using ElementTree

Instead of stripping out the namespaces, I suggest using namespace wildcards. Support for this was added in Python 3.8.

from xml.etree import ElementTree as ET

tree = ET.parse(adiPath)

rating = tree.find(".//{*}Rating") # Find the Rating element in any namespace
rating.text = "999"

Note that you have to use find() (or findall()). Wildcards do not work with iter().


The following workaround can be used to preserve the original namespace prefixes when serializing the XML document (see also https://stackoverflow.com/a/42372404/407651 and https://stackoverflow.com/a/54491129/407651).

namespaces = dict([elem for _, elem in ET.iterparse("test1.xml", events=['start-ns'])])
for ns in namespaces:
ET.register_namespace(ns, namespaces[ns])

Generate XML Document in Python 3 using Namespaces and ElementTree

I believe you are overthinking this.

Registering a default namespace in your code avoids the ns0: aliases.

Registering any namespaces you will use while creating a document allows you to designate the alias used for each namespace.

To achieve your desired output, assign the namespace to your top element:

a = ET.Element("{urn:dslforum-org:service-1-0}topNode")

The preceding ET.register_namespace("", "urn:dslforum-org:service-1-0") will make that the default namespace in the document, assign it to topNode, and not prefix your tag names.

<?xml version='1.0' encoding='utf-8'?>
<topNode xmlns="urn:dslforum-org:service-1-0"><childNode>content</childNode></topNode>

If you remove the register_namespace() call, then you get this monstrosity:

<?xml version='1.0' encoding='utf-8'?>
<ns0:topNode xmlns:ns0="urn:dslforum-org:service-1-0"><childNode>content</childNode></ns0:topNode>

Iterating though specific XML elements and namespace issues

Since e starts on the root tag, remove <ODM> from XPath expression:

col = e.findall('./{0}Study/{0}MetaDataVersion/{0}ItemDef'.format(namespace))

# Study Metadata
# start for-loop
# -- found ItemDef name= BL_D_VISITDATE
# -- found ItemDef name= BL_D_MEDCODE
# finished for-loop

Even better, use namespaces argument of findall using the dictionary you define to map to d prefix:

ns = {'d': 'http://www.cdisc.org/ns/odm/v1.3'}

col = e.findall('./d:Study/d:MetaDataVersion/d:ItemDef', namespaces=ns)

# SHORT-HAND FOR ANYWHERE SEARCH
col = e.findall('.//d:ItemDef', namespaces=ns)


Related Topics



Leave a reply



Submit