Parsing XML with namespace in Python via 'ElementTree'
You need to give the .find()
, findall()
and iterfind()
methods an explicit namespace dictionary:
namespaces = {'owl': 'http://www.w3.org/2002/07/owl#'} # add more as needed
root.findall('owl:Class', namespaces)
Prefixes are only looked up in the namespaces
parameter you pass in. This means you can use any namespace prefix you like; the API splits off the owl:
part, looks up the corresponding namespace URL in the namespaces
dictionary, then changes the search to look for the XPath expression {http://www.w3.org/2002/07/owl}Class
instead. You can use the same syntax yourself too of course:
root.findall('{http://www.w3.org/2002/07/owl#}Class')
Also see the Parsing XML with Namespaces section of the ElementTree documentation.
If you can switch to the lxml
library things are better; that library supports the same ElementTree API, but collects namespaces for you in .nsmap
attribute on elements and generally has superior namespaces support.
How to preserve namespaces when parsing xml via ElementTree in Python
ElementTree will replace those namespaces' prefixes that are not registered with ET.register_namespace
. To preserve a namespace prefix, you need to register it first before writing your modifications on a file. The following method does the job and registers all namespaces globally,
def register_all_namespaces(filename):
namespaces = dict([node for _, node in ET.iterparse(filename, events=['start-ns'])])
for ns in namespaces:
ET.register_namespace(ns, namespaces[ns])
This method should be called before ET.parse
method, so that the namespaces will remain as unchanged,
import xml.etree.ElementTree as ET
register_all_namespaces('filename.xml')
tree = ET.parse('filename.xml')
# XML modification here
# save the modifications
tree.write('filename.xml')
How to generate an XML file via Python's ElementTree with registered namespaces written in output
To get a properly prefixed name, try using QName()
.
To write the XML with the XML declaration, try using xml_declaration=True
in the ElementTree.write()
.
Example...
Python
import xml.etree.cElementTree as ET
ns = {"xs": "http://www.w3.org/2001/XMLSchema"}
ET.register_namespace('xs', ns["xs"])
root = ET.Element(ET.QName(ns["xs"], "House"))
ET.SubElement(root, ET.QName(ns["xs"], "Room"))
ET.ElementTree(root).write("output.xml", xml_declaration=True, encoding="utf-8")
XML Output
<?xml version='1.0' encoding='utf-8'?>
<xs:House xmlns:xs="http://www.w3.org/2001/XMLSchema"><xs:Room /></xs:House>
Note: You don't have to use the ns
dictionary. I just used it so I didn't have the full namespace uri everywhere.
Working with namespace while parsing XML using ElementTree
There are two problems on this line:
a=tree.find('parent')
First, <parent>
is not an immediate child of the root element. <parent>
is a grandchild of the root element. The path to parent looks like /project/grandparent/parent
. To search for <parent>
, try the XPath expression */parent
or possiblly //parent
.
Second, <parent>
exists in the default namespace, so you won't be able to .find()
it with just its simple name. You'll need to add the namespace.
Here are two equally valid calls to tree.find()
, each of which should find the <parent>
node:
a=tree.find('*/{http://maven.apache.org/POM/4.0.0}parent')
a=tree.find('*/xmlns:parent', namespaces=spaces)
Next, the call to findall()
needs a namespace qualifier:
for b in a.findall('xmlns:child', namespaces=spaces)
Fourth, the call to create the new child element needs a namespace qualifier. There may be a way to use the shortcut name, but I couldn't find it. I had to use the long form of the name.
ET.SubElement(a,'{http://maven.apache.org/POM/4.0.0}child').text="Jay/Doctor"
Finally, your XML output will look ugly unless you provide a default namespace:
tree.write('test.xml', default_namespace=spaces['xmlns'])
Unrelated to the XML aspects, you copied my answer from the previous question incorrectly. The else
lines up with the for
, not with the if
:
for ...
if ...
else ...
how to find and edit tags in XML files with namespaces using ElementTree
Instead of stripping out the namespaces, I suggest using namespace wildcards. Support for this was added in Python 3.8.
from xml.etree import ElementTree as ET
tree = ET.parse(adiPath)
rating = tree.find(".//{*}Rating") # Find the Rating element in any namespace
rating.text = "999"
Note that you have to use find()
(or findall()
). Wildcards do not work with iter()
.
The following workaround can be used to preserve the original namespace prefixes when serializing the XML document (see also https://stackoverflow.com/a/42372404/407651 and https://stackoverflow.com/a/54491129/407651).
namespaces = dict([elem for _, elem in ET.iterparse("test1.xml", events=['start-ns'])])
for ns in namespaces:
ET.register_namespace(ns, namespaces[ns])
Generate XML Document in Python 3 using Namespaces and ElementTree
I believe you are overthinking this.
Registering a default namespace in your code avoids the ns0:
aliases.
Registering any namespaces you will use while creating a document allows you to designate the alias used for each namespace.
To achieve your desired output, assign the namespace to your top element:
a = ET.Element("{urn:dslforum-org:service-1-0}topNode")
The preceding ET.register_namespace("", "urn:dslforum-org:service-1-0")
will make that the default namespace in the document, assign it to topNode
, and not prefix your tag names.
<?xml version='1.0' encoding='utf-8'?>
<topNode xmlns="urn:dslforum-org:service-1-0"><childNode>content</childNode></topNode>
If you remove the register_namespace()
call, then you get this monstrosity:
<?xml version='1.0' encoding='utf-8'?>
<ns0:topNode xmlns:ns0="urn:dslforum-org:service-1-0"><childNode>content</childNode></ns0:topNode>
Iterating though specific XML elements and namespace issues
Since e
starts on the root tag, remove <ODM>
from XPath expression:
col = e.findall('./{0}Study/{0}MetaDataVersion/{0}ItemDef'.format(namespace))
# Study Metadata
# start for-loop
# -- found ItemDef name= BL_D_VISITDATE
# -- found ItemDef name= BL_D_MEDCODE
# finished for-loop
Even better, use namespaces
argument of findall
using the dictionary you define to map to d
prefix:
ns = {'d': 'http://www.cdisc.org/ns/odm/v1.3'}
col = e.findall('./d:Study/d:MetaDataVersion/d:ItemDef', namespaces=ns)
# SHORT-HAND FOR ANYWHERE SEARCH
col = e.findall('.//d:ItemDef', namespaces=ns)
Related Topics
What Is the Eafp Principle in Python
Open Web in New Tab Selenium + Python
Select Dataframe Rows Between Two Dates
What Is the Result of % in Python
How to Check If a String Is a Substring of Items in a List of Strings
Setting the Correct Encoding When Piping Stdout in Python
How to Get the Filename Without the Extension from a Path in Python
How to Chain the Movement of a Snake'S Body
"Large Data" Workflows Using Pandas
How to Melt a Pandas Dataframe
Convert Hex String to Integer in Python
Parse Date String and Change Format
Reading Binary File and Looping Over Each Byte