Remove namespace and prefix from xml in python using lxml
Replace tag as Uku Loskit suggests. In addition to that, use lxml.objectify.deannotate.
from lxml import etree, objectify
metadata = '/Users/user1/Desktop/Python/metadata.xml'
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(metadata, parser)
root = tree.getroot()
####
for elem in root.getiterator():
if not hasattr(elem.tag, 'find'): continue # guard for Comment tags
i = elem.tag.find('}')
if i >= 0:
elem.tag = elem.tag[i+1:]
objectify.deannotate(root, cleanup_namespaces=True)
####
tree.write('/Users/user1/Desktop/Python/done.xml',
pretty_print=True, xml_declaration=True, encoding='UTF-8')
Note: Some tags like Comment
return a function when accessing tag
attribute. added a guard for that.
how can i remove ns from xml in python?
This is a combination of Remove namespace and prefix from xml in python using lxml, which shows how to modify the namespace of an element, and lxml: add namespace to input file, which shows how to reset the top namespace map.
The code is a little hacky (I'm particularly suspicious of whether or not it's kosher to use the _setroot
method), but it seems to work:
from lxml import etree
inputfile = 'data.xml'
target_ns = 'urn:ietf:params:xml:ns:epp-1.0'
nsmap = {None: target_ns}
tree = etree.parse(inputfile)
root = tree.getroot()
# here we set the namespace of all elements to target_ns
for elem in root.getiterator():
tag = etree.QName(elem.tag)
elem.tag = '{%s}%s' % (target_ns, tag.localname)
# create a new root element and set the namespace map, then
# copy over all the child elements
new_root = etree.Element(root.tag, nsmap=nsmap)
new_root[:] = root[:]
# create a new elementtree with new_root so that we can use the
# .write method.
tree = etree.ElementTree()
tree._setroot(new_root)
tree.write('done.xml',
pretty_print=True, xml_declaration=True, encoding='UTF-8')
Given your sample input, this produces in done.xml
:
<?xml version='1.0' encoding='UTF-8'?>
<epp xmlns="urn:ietf:params:xml:ns:epp-1.0"><command>
<check>
<check>
<id>ex61-irnic</id>
<id>ex999-irnic</id>
<authInfo>
<pw>1487441516170712</pw>
</authInfo>
</check>
</check>
<clTRID>TEST-12345</clTRID>
</command>
</epp>
How can I strip namespaces out of an lxml tree?
One possible way to remove namespace prefix from each element :
def strip_ns_prefix(tree):
#iterate through only element nodes (skip comment node, text node, etc) :
for element in tree.xpath('descendant-or-self::*'):
#if element has prefix...
if element.prefix:
#replace element name with its local name
element.tag = etree.QName(element).localname
return tree
Another version which has namespace checking in the xpath instead of using if
statement :
def strip_ns_prefix(tree):
#xpath query for selecting all element nodes in namespace
query = "descendant-or-self::*[namespace-uri()!='']"
#for each element returned by the above xpath query...
for element in tree.xpath(query):
#replace element name with its local name
element.tag = etree.QName(element).localname
return tree
Remove namespace from XML with comment - Python
Make sure that the node is not a comment before changing the tag. The code below also removes any attributes that are in a namespace.
for elem in root.getiterator():
# For elements, replace qualified name with localname
if not(type(elem) == etree._Comment):
elem.tag = etree.QName(elem).localname
# Remove attributes that are in a namespace
for attr in elem.attrib:
if "{" in attr:
elem.attrib.pop(attr)
How to keep xml namespace without prefix, while generating XML using lxml?
You're close. In nsmap=
, instead of ''
, use None
:
root = etree.Element('{http://www.w3.org/2000/svg}svg', nsmap={None: 'http://www.w3.org/2000/svg'})
root.append(element) # element is a <path> element extracted from another SVG file
print(etree.tostring(root).decode())
This will preserve the namespace, but not add any namespace prefix (i.e. ns0
).
Remove namespaces and nodes from XML string in python
You can simply extract the relevant portion into a new document:
import xml.etree.ElementTree as ET
root = ET.fromstring(dmXML)
new_root = root.find('.//DataMap')
print(ET.tostring(new_root, xml_declaration=True, encoding='UTF-8').decode())
Output:
<?xml version='1.0' encoding='UTF-8'?>
<DataMap sourceType="0">
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1" />
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1" />
</DataMap>
lxml etree xmlparser remove unwanted namespace
import io
import lxml.etree as ET
content='''\
<Envelope xmlns="http://www.example.com/zzz/yyy">
<Header>
<Version>1</Version>
</Header>
<Body>
some stuff
</Body>
</Envelope>
'''
dom = ET.parse(io.BytesIO(content))
You can find namespace-aware nodes using the xpath
method:
body=dom.xpath('//ns:Body',namespaces={'ns':'http://www.example.com/zzz/yyy'})
print(body)
# [<Element {http://www.example.com/zzz/yyy}Body at 90b2d4c>]
If you really want to remove namespaces, you could use an XSL transformation:
# http://wiki.tei-c.org/index.php/Remove-Namespaces.xsl
xslt='''<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="no"/>
<xsl:template match="/|comment()|processing-instruction()">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="*">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="@*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="@*">
<xsl:attribute name="{local-name()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
'''
xslt_doc=ET.parse(io.BytesIO(xslt))
transform=ET.XSLT(xslt_doc)
dom=transform(dom)
Here we see the namespace has been removed:
print(ET.tostring(dom))
# <Envelope>
# <Header>
# <Version>1</Version>
# </Header>
# <Body>
# some stuff
# </Body>
# </Envelope>
So you can now find the Body node this way:
print(dom.find("Body"))
# <Element Body at 8506cd4>
Retain namespace prefix in a tag when parsing xml using lxml
That's probably caused by the fact that you are using an html parser to read xml.
Try it like this:
from lxml import etree
root = etree.XML(xml)
for element in root.xpath('//item/book/pages/*'):
xml = etree.tostring(element, encoding='utf-8')
print(xml)
This should give you the expected output.
Related Topics
Python - Rolling Functions for Groupby Object
Compulsory Usage of If _Name_=="_Main_" in Windows While Using Multiprocessing
Splitting a Semicolon-Separated String to a Dictionary, in Python
Implement Matlab's Im2Col 'Sliding' in Python
Cannot Return Results from Stored Procedure Using Python Cursor
What's the Best Way to Generate a Uml Diagram from Python Source Code
Differencebetween Pylab and Pyplot
How to Bundle Other Files When Using Cx_Freeze
What Is the Recommended Way of Allocating Memory for a Typed Memory View
How to Change a Module Variable from Another Module
Function Name Is Undefined in Python Class
Most Efficient Property to Hash for Numpy Array
Classification Using Movie Review Corpus in Nltk/Python