How do I use a default namespace in an lxml xpath query?
Something like this should work:
import lxml.etree as et
ns = {"atom": "http://www.w3.org/2005/Atom"}
tree = et.fromstring(xml)
for node in tree.xpath('//atom:entry', namespaces=ns):
print node
See also http://lxml.de/xpathxslt.html#namespaces-and-prefixes.Alternative:
for node in tree.xpath("//*[local-name() = 'entry']"):
print node
lxml - Default name spaces
Hackish or not, you have to specify a prefix. XPath 1.0, which is what lxml supports, does not have a concept of default namespace (it works differently in XPath 2.0, but that does not apply here).
The other option is to not bother with prefixes at all. Use the fully qualified element name in "Clark notation" instead:
my_root.findall('{http://filler.com/default.xsd}Prop').
See also http://lxml.de/FAQ.html#how-can-i-specify-a-default-namespace-for-xpath-expressions.Update August 2019
The behaviour has changed in later versions of lxml. With lxml 4.4.1, both None
and the empty string can be used:
from lxml import etree
my_tree = etree.parse("props.xml")
my_root = my_tree.getroot()
NS = 'http://filler.com/default.xsd'
NSMAP1 = {None: NS}
NSMAP2 = {'': NS}
NSMAP3 = {'default': NS}
print(my_root.findall('Prop', NSMAP1))
print(my_root.findall('Prop', NSMAP2))
print(my_root.findall('default:Prop', NSMAP3))
Output:[<Element {http://filler.com/default.xsd}Prop at 0x31f1260>]
[<Element {http://filler.com/default.xsd}Prop at 0x31f1288>]
[<Element {http://filler.com/default.xsd}Prop at 0x31f1260>]
How to import lxml xpath functions to default namespace?
You can put a function in the empty function namespace:
functionNS = etree.FunctionNamespace(None)
functionNS['test'] = lambda context, nodes, *args: print(context, nodes, args)
By doing so, the new test
function is already registered with the empty namespace prefix, that means you can use it like this:root.xpath("//*[test(., 'arg1', 'arg2')]")
Unfortunately the function that is called for "{http://exslt.org/regular-expressions}test"
isn't available from python, only from within the lxml extension implemented in C, so you can't simply assign it to functionNS['test']
.That means you'd need to reimplement it in python to assign it to the empty function namespace...
If that's not worth the trouble for you to spare you typing three characters, you could use this trick to make the re
prefix for the namespace global:
etree.FunctionNamespace("http://exslt.org/regular-expressions").prefix = 're'
Then at least you don't need to pass the namespaces dict for each xpath expression. lxml: XPath and namespaces on an element
Looks like you're just missing the j
prefix on level
...
//j:isis-database-information/j:isis-database[j:level='2']/j:isis-database-entry
how to query xml data with namespaces using xpath in python
You can define your namespaces as -
ns = {'n': 'http://www.topografix.com/GPX/1/1',
'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
'gpxtpx': 'http://www.garmin.com/xmlschemas/TrackPointExtension/v1',
'gpxx': 'http://www.garmin.com/xmlschemas/GpxExtensions/v3'}
This would define the prefix for 'http://www.topografix.com/GPX/1/1'
as n
, and then in your XPath query, you can use that prefix. Example -expr = 'n:trk/n:trkseg/n:trkpt/n:ele'
for element in tree.xpath(expr, namespaces=ns):
print(element.text)
This is because the xmlns for the root node is - 'http://www.topografix.com/GPX/1/1'
- hence all the child nodes automatically inherit that as the xmlns (namespace) , unless the child node uses a different prefix or specifies an namespace of its own.Example/Demo -
In [44]: ns = {'n': 'http://www.topografix.com/GPX/1/1',
....: 'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
....: 'gpxtpx': 'http://www.garmin.com/xmlschemas/TrackPointExtension/v1',
....: 'gpxx': 'http://www.garmin.com/xmlschemas/GpxExtensions/v3'}
In [45]:
In [45]: expr = 'n:trk/n:trkseg/n:trkpt/n:ele'
In [46]: for element in tree.xpath(expr, namespaces=ns):
....: print(element.text)
....:
2261.8
2261.6
2262.0
2261.8
parsing xml containing default namespace to get an element value using lxml
This is a common error when dealing with XML having default namespace. Your XML has default namespace, a namespace declared without prefix, here :
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
Note that not only element where default namespace declared is in that namespace, but all descendant elements inherit ancestor default namespace implicitly, unless otherwise specified (using explicit namespace prefix or local default namespace that point to different namespace uri). That means, in this case, all elements including loc
are in default namespace.To select element in namespace, you'll need to define prefix to namespace mapping and use the prefix properly in the XPath :
from lxml import etree
str1 = '''<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>
http://www.example.org/sitemap_1.xml.gz
</loc>
<lastmod>2015-07-01</lastmod>
</sitemap>
</sitemapindex>'''
root = etree.fromstring(str1)
ns = {"d" : "http://www.sitemaps.org/schemas/sitemap/0.9"}
url = root.xpath("//d:loc", namespaces=ns)[0]
print etree.tostring(url)
output :<loc xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
http://www.example.org/sitemap_1.xml.gz
</loc>
Registering Namespace in Python XPath query
You need to first define a namespace map, declare a prefix for those namespaces that don't have one (as is the case here) and then apply xpath:
from lxml import etree
prods ="""[your xml above]"""
ns = { (k if k else "xx"):(v) for k, v in doc.xpath('//namespace::*') } #create ns map
doc = etree.XML(prods)
for product in doc.xpath('//products//Product[.//xx:Attribute[@Name="Identifier"][text()="NumberOne"]]', namespaces=ns):
print(etree.tostring(product).decode())
Output:<Product xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Id="1">
<Product Id="1_1">
<Attribute Name="Whatever"/>
</Product>
<Attributes xmlns="http://some/path/to/entity/def">
<Attribute Name="Identifier">NumberOne</Attribute>
</Attributes>
</Product>
To suppress the namespaces attributes, change the for
loop to:for product in doc.xpath('//products//Product[.//xx:Attribute[@Name="Identifier"][text()="NumberOne"]]', namespaces=ns):
etree.cleanup_namespaces(doc) #note: the parameter is "doc", not "product"
print(etree.tostring(product).decode())
Output:<Product Id="1">
<Product Id="1_1">
<Attribute Name="Whatever"/>
</Product>
<Attributes xmlns="http://some/path/to/entity/def">
<Attribute Name="Identifier">NumberOne</Attribute>
</Attributes>
</Product>
Find element that has unknown namespace in lxml
You could declare all namespaces, but given the structure of your sample xml, I would argue you are better off disregarding namespaces altogether and just using local-name()
; so
cntry_node = root.xpath('.//*[local-name()="country"]')
cntry_node
returns[<Element {aaa:bbb:ccc:liechtenstein:eee}country at 0x1cddf1d4680>,
<Element {aaa:bbb:ccc:singapore:eee}country at 0x1cddf1d47c0>,
<Element {aaa:bbb:ccc:panama:eee}country at 0x1cddf1d45c0>]
Related Topics
What Is the Max Length of a Python String
Find Index of Last Occurrence of a Substring in a String
Destructuring-Bind Dictionary Contents
Python - Datetime with Timezone to Epoch
Overriding the Save Method in Django Modelform
How to Strip All Whitespace from String
Scipy Curve_Fit Doesn't Like Math Module
What Does Model.Train() Do in Pytorch
Use Index in Pandas to Plot Data
Rotating a Two-Dimensional Array in Python
Detect File Change Without Polling
Suppressing Scientific Notation in Pandas
Break the Function After Certain Time