Xpaths and Default Namespaces

XPATHS and Default Namespaces

I tried something similar to what palehorse proposed and could not get it to work. Since I was getting data from a published service I couldn't change the xml. I ended up using XmlDocument and XmlNamespaceManager like so:

XmlDocument doc = new XmlDocument();
doc.LoadXml(xmlWithBogusNamespace);
XmlNamespaceManager nSpace = new XmlNamespaceManager(doc.NameTable);
nSpace.AddNamespace("myNs", "http://theirUri");

XmlNodeList nodes = doc.SelectNodes("//myNs:NodesIWant",nSpace);
//etc

xpath with default namespace with no preifx

Within XPath, no, there is no alternative syntax that ignores namespaces, default or otherwise, other than testing against local-name() (all versions) or *:NCName (XPath 2.0 and later).

Within XPath, there also is no mechanism for binding namespace prefixes to namespaces. For a list of namespace prefix binding mechanisms that are available in various hosting languages and libraries, see How does XPath deal with XML namespaces?

XPath Expression for getting default namespace

Your expression

/aaa/*[name()='bbb' and position()=1]/namespace::*

is correct and returns three namespace nodes. The problem might be in the way you are processing these nodes after they are returned. The expression should work in both XPath 1.0 and XPath 2.0, though I haven't checked it with the XPath engine built in to the JDK.

(Incidentally the notion that because you are using JDK 1.7 therefore you are using XPath 1.0 is a complete non sequitur, since there are several XPath 2.0 engines available for Java users).

To return only the namespace URI corresponding to the default namespace, use

/aaa/*[name()='bbb' and position()=1]/namespace::*[name()='']

Or indeed, since this query already assumes that the bbb element is in the default namespace, use

namespace-uri(/aaa/*[name()='bbb' and position()=1])

How does XPath deal with XML namespaces?

XPath 1.0/2.0

Defining namespaces in XPath (recommended)

XPath itself doesn't have a way to bind a namespace prefix with a namespace. Such facilities are provided by the hosting library.

It is recommended that you use those facilities and define namespace prefixes that can then be used to qualify XML element and attribute names as necessary.


Here are some of the various mechanisms which XPath hosts provide for specifying namespace prefix bindings to namespace URIs.

(OP's original XPath, /IntuitResponse/QueryResponse/Bill/Id, has been elided to /IntuitResponse/QueryResponse.)

C#:

XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
XmlNodeList nodes = el.SelectNodes(@"/i:IntuitResponse/i:QueryResponse", nsmgr);

Google Docs:

Unfortunately, IMPORTXML() does not provide a namespace prefix binding mechanism. See next section, Defeating namespaces in XPath, for how to use local-name() as a work-around.

Java (SAX):

NamespaceSupport support = new NamespaceSupport();
support.pushContext();
support.declarePrefix("i", "http://schema.intuit.com/finance/v3");

Java (XPath):

xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
switch (prefix) {
case "i": return "http://schema.intuit.com/finance/v3";
// ...
}
});
  • Remember to call
    DocumentBuilderFactory.setNamespaceAware(true).
  • See also:
    Java XPath: Queries with default namespace xmlns

JavaScript:

See Implementing a User Defined Namespace Resolver:

function nsResolver(prefix) {
var ns = {
'i' : 'http://schema.intuit.com/finance/v3'
};
return ns[prefix] || null;
}
document.evaluate( '/i:IntuitResponse/i:QueryResponse',
document, nsResolver, XPathResult.ANY_TYPE,
null );

Note that if the default namespace has an associated namespace prefix defined, using the nsResolver() returned by Document.createNSResolver() can obviate the need for a customer nsResolver().

Perl (LibXML):

my $xc = XML::LibXML::XPathContext->new($doc);
$xc->registerNs('i', 'http://schema.intuit.com/finance/v3');
my @nodes = $xc->findnodes('/i:IntuitResponse/i:QueryResponse');

Python (lxml):

from lxml import etree
f = StringIO('<IntuitResponse>...</IntuitResponse>')
doc = etree.parse(f)
r = doc.xpath('/i:IntuitResponse/i:QueryResponse',
namespaces={'i':'http://schema.intuit.com/finance/v3'})

Python (ElementTree):

namespaces = {'i': 'http://schema.intuit.com/finance/v3'}
root.findall('/i:IntuitResponse/i:QueryResponse', namespaces)

Python (Scrapy):

response.selector.register_namespace('i', 'http://schema.intuit.com/finance/v3')
response.xpath('/i:IntuitResponse/i:QueryResponse').getall()

PhP:

Adapted from @Tomalak's answer using DOMDocument:

$result = new DOMDocument();
$result->loadXML($xml);

$xpath = new DOMXpath($result);
$xpath->registerNamespace("i", "http://schema.intuit.com/finance/v3");

$result = $xpath->query("/i:IntuitResponse/i:QueryResponse");

See also @IMSoP's canonical Q/A on PHP SimpleXML namespaces.

Ruby (Nokogiri):

puts doc.xpath('/i:IntuitResponse/i:QueryResponse',
'i' => "http://schema.intuit.com/finance/v3")

Note that Nokogiri supports removal of namespaces,

doc.remove_namespaces!

but see the below warnings discouraging the defeating of XML namespaces.

VBA:

xmlNS = "xmlns:i='http://schema.intuit.com/finance/v3'"
doc.setProperty "SelectionNamespaces", xmlNS
Set queryResponseElement =doc.SelectSingleNode("/i:IntuitResponse/i:QueryResponse")

VB.NET:

xmlDoc = New XmlDocument()
xmlDoc.Load("file.xml")
nsmgr = New XmlNamespaceManager(New XmlNameTable())
nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
nodes = xmlDoc.DocumentElement.SelectNodes("/i:IntuitResponse/i:QueryResponse",
nsmgr)

SoapUI (doc):

declare namespace i='http://schema.intuit.com/finance/v3';
/i:IntuitResponse/i:QueryResponse

xmlstarlet:

-N i="http://schema.intuit.com/finance/v3"

XSLT:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:i="http://schema.intuit.com/finance/v3">
...

Once you've declared a namespace prefix, your XPath can be written to use it:

/i:IntuitResponse/i:QueryResponse


Defeating namespaces in XPath (not recommended)

An alternative is to write predicates that test against local-name():

/*[local-name()='IntuitResponse']/*[local-name()='QueryResponse']

Or, in XPath 2.0:

/*:IntuitResponse/*:QueryResponse

Skirting namespaces in this manner works but is not recommended because it

  • Under-specifies the full element/attribute name.

  • Fails to differentiate between element/attribute names in different
    namespaces (the very purpose of namespaces). Note that this concern could be addressed by adding an additional predicate to check the namespace URI explicitly:

     /*[    namespace-uri()='http://schema.intuit.com/finance/v3' 
    and local-name()='IntuitResponse']
    /*[ namespace-uri()='http://schema.intuit.com/finance/v3'
    and local-name()='QueryResponse']

    Thanks to Daniel Haley for the namespace-uri() note.

  • Is excessively verbose.

XPath 3.0/3.1

Libraries and tools that support modern XPath 3.0/3.1 allow the specification of a namespace URI directly in an XPath expression:

/Q{http://schema.intuit.com/finance/v3}IntuitResponse/Q{http://schema.intuit.com/finance/v3}QueryResponse

While Q{http://schema.intuit.com/finance/v3} is much more verbose than using an XML namespace prefix, it has the advantage of being independent of the namespace prefix binding mechanism of the hosting library. The Q{} notation is known as Clark Notation after its originator, James Clark. The W3C XPath 3.1 EBNF grammar calls it a BracedURILiteral.

Thanks to Michael Kay for the suggestion to cover XPath 3.0/3.1's BracedURILiteral.

XPath in SimpleXML for default namespaces without needing prefixes

From a bit of reading online, this is not restricted to any particular PHP or other library, but to XPath itself - at least in XPath version 1.0

XPath 1.0 does not include any concept of a "default" namespace, so regardless of how the element names appear in the XML source, if they have a namespace bound to them, the selectors for them must be prefixed in basic XPath selectors of the form ns:name. Note that ns is a prefix defined within the XPath processor, not by the document being processed, so has no relationship to how xmlns attributes are used in the XML representation.

See e.g. this "common XSLT mistakes" page, talking about the closely related XSLT 1.0:

To access namespaced elements in XPath, you must define a prefix for their namespace. [...] Unfortunately, XSLT version 1.0 has no concept similar to a default namespace; therefore, you must repeat namespace prefixes again and again.

According to an answer to a similar question, XPath 2.0 does include a notion of "default namespace", and the XSLT page linked above mentions this also in the context of XSLT 2.0.

Unfortunately, all of the built-in XML extensions in PHP are built on top of the libxml2 and libxslt libraries, which support only version 1.0 of XPath and XSLT.

So other than pre-processing the document not to use namespaces, your only option would be to find an XPath 2.0 processor that you could plug in to PHP.

(As an aside, it's worth noting that if you have unprefixed attributes in your XML document, they are not technically in the default namespace, but rather in no namespace at all; see XML Namespaces and Unprefixed Attributes for discussion of this oddity of the Namespace spec.)

How do I use XPath with a default namespace with no prefix?

The configuration element is in the unnamed namespace, and the MyNode is bound to the lcmp namespace without a namespace prefix.

This XPATH statement will allow you to address the MyNode element without having declared the lcmp namespace or use a namespace prefix in your XPATH:

/configuration/*[namespace-uri()='lcmp' and local-name()='MyNode']

It matches any element that is a child of configuration and then uses a predicate filer with namespace-uri() and local-name() functions to restrict it to the MyNode element.

If you don't know which namespace-uri's will be used for the elements, then you can make the XPATH more generic and just match on the local-name():

/configuration/*[local-name()='MyNode']

However, you run the risk of matching different elements in different vocabularies(bound to different namespace-uri's) that happen to use the same name.

Why do XPath elements inherit the default namespace, but attributes do not?

This is what the Namespaces spec has to say on the matter:

Default namespace declarations do not apply directly to attribute
names; the interpretation of unprefixed attributes is determined by
the element on which they appear.

No rationale is given as to why the authors of the spec made this decision.

It has been pointed out that this statement is in fact very fuzzy. It doesn't say, for example, that unprefixed attributes are in no namespace, and it doesn't say that they are in the same namespace as the containing element, but both interpretations could reasonably be adopted. For a definitive statement you have to look at the XPath 1.0 specification which states unequivocally:

The namespace URI of the attribute's name will be null if the QName of
the attribute does not have a prefix.

Again, no rationale is given (it's not usual for specifications to give a rationale unless the editors felt a strong need to justify themselves). But the editor of the XPath 1.0 spec was James Clark, and James Clark has written a tutorial here

http://www.jclark.com/xml/xmlns.htm

which gives clues as to his thinking about namespaces in general; but no rationale for this particular decision, which he simply restates:

Note that the xmlns attribute does not affect unprefixed attribute
names.

xpath-default-namespace: matching nodes from loaded XML without namespace

You can change that setting as needed: <xsl:variable name="Items" xpath-default-namespace="" select="$filter-doc/Items"></xsl:variable>.



Related Topics



Leave a reply



Submit