How to Use a Default Namespace in an Lxml Xpath Query

How do I use a default namespace in an lxml xpath query?

Something like this should work:

import lxml.etree as et

ns = {"atom": "http://www.w3.org/2005/Atom"}
tree = et.fromstring(xml)
for node in tree.xpath('//atom:entry', namespaces=ns):
print node

See also http://lxml.de/xpathxslt.html#namespaces-and-prefixes.

Alternative:

for node in tree.xpath("//*[local-name() = 'entry']"):
print node

lxml - Default name spaces

Hackish or not, you have to specify a prefix. XPath 1.0, which is what lxml supports, does not have a concept of default namespace (it works differently in XPath 2.0, but that does not apply here).

The other option is to not bother with prefixes at all. Use the fully qualified element name in "Clark notation" instead:

 my_root.findall('{http://filler.com/default.xsd}Prop').

See also http://lxml.de/FAQ.html#how-can-i-specify-a-default-namespace-for-xpath-expressions.

Update August 2019

The behaviour has changed in later versions of lxml. With lxml 4.4.1, both None and the empty string can be used:

from lxml import etree

my_tree = etree.parse("props.xml")
my_root = my_tree.getroot()

NS = 'http://filler.com/default.xsd'

NSMAP1 = {None: NS}
NSMAP2 = {'': NS}
NSMAP3 = {'default': NS}

print(my_root.findall('Prop', NSMAP1))
print(my_root.findall('Prop', NSMAP2))
print(my_root.findall('default:Prop', NSMAP3))

Output:

[<Element {http://filler.com/default.xsd}Prop at 0x31f1260>]
[<Element {http://filler.com/default.xsd}Prop at 0x31f1288>]
[<Element {http://filler.com/default.xsd}Prop at 0x31f1260>]

How to import lxml xpath functions to default namespace?

You can put a function in the empty function namespace:

functionNS = etree.FunctionNamespace(None)
functionNS['test'] = lambda context, nodes, *args: print(context, nodes, args)

By doing so, the new test function is already registered with the empty namespace prefix, that means you can use it like this:

root.xpath("//*[test(., 'arg1', 'arg2')]")

Unfortunately the function that is called for "{http://exslt.org/regular-expressions}test" isn't available from python, only from within the lxml extension implemented in C, so you can't simply assign it to functionNS['test'].

That means you'd need to reimplement it in python to assign it to the empty function namespace...

If that's not worth the trouble for you to spare you typing three characters, you could use this trick to make the re prefix for the namespace global:

etree.FunctionNamespace("http://exslt.org/regular-expressions").prefix = 're'

Then at least you don't need to pass the namespaces dict for each xpath expression.

lxml: XPath and namespaces on an element

Looks like you're just missing the j prefix on level...

//j:isis-database-information/j:isis-database[j:level='2']/j:isis-database-entry

how to query xml data with namespaces using xpath in python

You can define your namespaces as -

ns = {'n': 'http://www.topografix.com/GPX/1/1',
'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
'gpxtpx': 'http://www.garmin.com/xmlschemas/TrackPointExtension/v1',
'gpxx': 'http://www.garmin.com/xmlschemas/GpxExtensions/v3'}

This would define the prefix for 'http://www.topografix.com/GPX/1/1' as n , and then in your XPath query, you can use that prefix. Example -

expr = 'n:trk/n:trkseg/n:trkpt/n:ele'

for element in tree.xpath(expr, namespaces=ns):
print(element.text)

This is because the xmlns for the root node is - 'http://www.topografix.com/GPX/1/1' - hence all the child nodes automatically inherit that as the xmlns (namespace) , unless the child node uses a different prefix or specifies an namespace of its own.

Example/Demo -

In [44]: ns = {'n': 'http://www.topografix.com/GPX/1/1',
....: 'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
....: 'gpxtpx': 'http://www.garmin.com/xmlschemas/TrackPointExtension/v1',
....: 'gpxx': 'http://www.garmin.com/xmlschemas/GpxExtensions/v3'}

In [45]:

In [45]: expr = 'n:trk/n:trkseg/n:trkpt/n:ele'

In [46]: for element in tree.xpath(expr, namespaces=ns):
....: print(element.text)
....:
2261.8
2261.6
2262.0
2261.8

parsing xml containing default namespace to get an element value using lxml

This is a common error when dealing with XML having default namespace. Your XML has default namespace, a namespace declared without prefix, here :

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

Note that not only element where default namespace declared is in that namespace, but all descendant elements inherit ancestor default namespace implicitly, unless otherwise specified (using explicit namespace prefix or local default namespace that point to different namespace uri). That means, in this case, all elements including loc are in default namespace.

To select element in namespace, you'll need to define prefix to namespace mapping and use the prefix properly in the XPath :

from lxml import etree
str1 = '''<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>
http://www.example.org/sitemap_1.xml.gz
</loc>
<lastmod>2015-07-01</lastmod>
</sitemap>
</sitemapindex>'''
root = etree.fromstring(str1)

ns = {"d" : "http://www.sitemaps.org/schemas/sitemap/0.9"}
url = root.xpath("//d:loc", namespaces=ns)[0]
print etree.tostring(url)

output :

<loc xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
http://www.example.org/sitemap_1.xml.gz
</loc>

Registering Namespace in Python XPath query

You need to first define a namespace map, declare a prefix for those namespaces that don't have one (as is the case here) and then apply xpath:

from lxml import etree
prods ="""[your xml above]"""
ns = { (k if k else "xx"):(v) for k, v in doc.xpath('//namespace::*') } #create ns map
doc = etree.XML(prods)
for product in doc.xpath('//products//Product[.//xx:Attribute[@Name="Identifier"][text()="NumberOne"]]', namespaces=ns):
print(etree.tostring(product).decode())

Output:

<Product xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Id="1">
<Product Id="1_1">
<Attribute Name="Whatever"/>
</Product>
<Attributes xmlns="http://some/path/to/entity/def">
<Attribute Name="Identifier">NumberOne</Attribute>
</Attributes>
</Product>

To suppress the namespaces attributes, change the for loop to:

for product in doc.xpath('//products//Product[.//xx:Attribute[@Name="Identifier"][text()="NumberOne"]]', namespaces=ns):
etree.cleanup_namespaces(doc) #note: the parameter is "doc", not "product"
print(etree.tostring(product).decode())

Output:

<Product Id="1">
<Product Id="1_1">
<Attribute Name="Whatever"/>
</Product>
<Attributes xmlns="http://some/path/to/entity/def">
<Attribute Name="Identifier">NumberOne</Attribute>
</Attributes>
</Product>

Find element that has unknown namespace in lxml

You could declare all namespaces, but given the structure of your sample xml, I would argue you are better off disregarding namespaces altogether and just using local-name(); so

cntry_node = root.xpath('.//*[local-name()="country"]')
cntry_node

returns

[<Element {aaa:bbb:ccc:liechtenstein:eee}country at 0x1cddf1d4680>,
<Element {aaa:bbb:ccc:singapore:eee}country at 0x1cddf1d47c0>,
<Element {aaa:bbb:ccc:panama:eee}country at 0x1cddf1d45c0>]


Related Topics



Leave a reply



Submit