Remove Namespace and Prefix from Xml in Python Using Lxml

Remove namespace and prefix from xml in python using lxml

Replace tag as Uku Loskit suggests. In addition to that, use lxml.objectify.deannotate.

from lxml import etree, objectify

metadata = '/Users/user1/Desktop/Python/metadata.xml'
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(metadata, parser)
root = tree.getroot()

####
for elem in root.getiterator():
if not hasattr(elem.tag, 'find'): continue # guard for Comment tags
i = elem.tag.find('}')
if i >= 0:
elem.tag = elem.tag[i+1:]
objectify.deannotate(root, cleanup_namespaces=True)
####

tree.write('/Users/user1/Desktop/Python/done.xml',
pretty_print=True, xml_declaration=True, encoding='UTF-8')

Note: Some tags like Comment return a function when accessing tag attribute. added a guard for that.

how can i remove ns from xml in python?

This is a combination of Remove namespace and prefix from xml in python using lxml, which shows how to modify the namespace of an element, and lxml: add namespace to input file, which shows how to reset the top namespace map.

The code is a little hacky (I'm particularly suspicious of whether or not it's kosher to use the _setroot method), but it seems to work:

from lxml import etree

inputfile = 'data.xml'
target_ns = 'urn:ietf:params:xml:ns:epp-1.0'
nsmap = {None: target_ns}

tree = etree.parse(inputfile)
root = tree.getroot()

# here we set the namespace of all elements to target_ns
for elem in root.getiterator():
tag = etree.QName(elem.tag)
elem.tag = '{%s}%s' % (target_ns, tag.localname)

# create a new root element and set the namespace map, then
# copy over all the child elements
new_root = etree.Element(root.tag, nsmap=nsmap)
new_root[:] = root[:]

# create a new elementtree with new_root so that we can use the
# .write method.
tree = etree.ElementTree()
tree._setroot(new_root)

tree.write('done.xml',
pretty_print=True, xml_declaration=True, encoding='UTF-8')

Given your sample input, this produces in done.xml:

<?xml version='1.0' encoding='UTF-8'?>
<epp xmlns="urn:ietf:params:xml:ns:epp-1.0"><command>
<check>
<check>
<id>ex61-irnic</id>
<id>ex999-irnic</id>
<authInfo>
<pw>1487441516170712</pw>
</authInfo>
</check>
</check>
<clTRID>TEST-12345</clTRID>
</command>
</epp>

How can I strip namespaces out of an lxml tree?

One possible way to remove namespace prefix from each element :

def strip_ns_prefix(tree):
#iterate through only element nodes (skip comment node, text node, etc) :
for element in tree.xpath('descendant-or-self::*'):
#if element has prefix...
if element.prefix:
#replace element name with its local name
element.tag = etree.QName(element).localname
return tree

Another version which has namespace checking in the xpath instead of using if statement :

def strip_ns_prefix(tree):
#xpath query for selecting all element nodes in namespace
query = "descendant-or-self::*[namespace-uri()!='']"
#for each element returned by the above xpath query...
for element in tree.xpath(query):
#replace element name with its local name
element.tag = etree.QName(element).localname
return tree

Remove namespace from XML with comment - Python

Make sure that the node is not a comment before changing the tag. The code below also removes any attributes that are in a namespace.

for elem in root.getiterator():
# For elements, replace qualified name with localname
if not(type(elem) == etree._Comment):
elem.tag = etree.QName(elem).localname

# Remove attributes that are in a namespace
for attr in elem.attrib:
if "{" in attr:
elem.attrib.pop(attr)

How to keep xml namespace without prefix, while generating XML using lxml?

You're close. In nsmap=, instead of '', use None:

root = etree.Element('{http://www.w3.org/2000/svg}svg', nsmap={None: 'http://www.w3.org/2000/svg'})
root.append(element) # element is a <path> element extracted from another SVG file
print(etree.tostring(root).decode())

This will preserve the namespace, but not add any namespace prefix (i.e. ns0).

Remove namespaces and nodes from XML string in python

You can simply extract the relevant portion into a new document:

import xml.etree.ElementTree as ET
root = ET.fromstring(dmXML)
new_root = root.find('.//DataMap')
print(ET.tostring(new_root, xml_declaration=True, encoding='UTF-8').decode())

Output:

<?xml version='1.0' encoding='UTF-8'?>
<DataMap sourceType="0">
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1" />
<FieldMap flag="Q1" destination="Q1_1" source="Q1_1" />
</DataMap>

lxml etree xmlparser remove unwanted namespace

import io
import lxml.etree as ET

content='''\
<Envelope xmlns="http://www.example.com/zzz/yyy">
<Header>
<Version>1</Version>
</Header>
<Body>
some stuff
</Body>
</Envelope>
'''
dom = ET.parse(io.BytesIO(content))

You can find namespace-aware nodes using the xpath method:

body=dom.xpath('//ns:Body',namespaces={'ns':'http://www.example.com/zzz/yyy'})
print(body)
# [<Element {http://www.example.com/zzz/yyy}Body at 90b2d4c>]

If you really want to remove namespaces, you could use an XSL transformation:

# http://wiki.tei-c.org/index.php/Remove-Namespaces.xsl
xslt='''<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="no"/>

<xsl:template match="/|comment()|processing-instruction()">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>

<xsl:template match="*">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="@*|node()"/>
</xsl:element>
</xsl:template>

<xsl:template match="@*">
<xsl:attribute name="{local-name()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
'''

xslt_doc=ET.parse(io.BytesIO(xslt))
transform=ET.XSLT(xslt_doc)
dom=transform(dom)

Here we see the namespace has been removed:

print(ET.tostring(dom))
# <Envelope>
# <Header>
# <Version>1</Version>
# </Header>
# <Body>
# some stuff
# </Body>
# </Envelope>

So you can now find the Body node this way:

print(dom.find("Body"))
# <Element Body at 8506cd4>

Retain namespace prefix in a tag when parsing xml using lxml

That's probably caused by the fact that you are using an html parser to read xml.

Try it like this:

from lxml import etree
root = etree.XML(xml)
for element in root.xpath('//item/book/pages/*'):
xml = etree.tostring(element, encoding='utf-8')
print(xml)

This should give you the expected output.



Related Topics



Leave a reply



Submit