Xpath in Simplexml for Default Namespaces Without Needing Prefixes

XPath in SimpleXML for default namespaces without needing prefixes

From a bit of reading online, this is not restricted to any particular PHP or other library, but to XPath itself - at least in XPath version 1.0

XPath 1.0 does not include any concept of a "default" namespace, so regardless of how the element names appear in the XML source, if they have a namespace bound to them, the selectors for them must be prefixed in basic XPath selectors of the form ns:name. Note that ns is a prefix defined within the XPath processor, not by the document being processed, so has no relationship to how xmlns attributes are used in the XML representation.

See e.g. this "common XSLT mistakes" page, talking about the closely related XSLT 1.0:

To access namespaced elements in XPath, you must define a prefix for their namespace. [...] Unfortunately, XSLT version 1.0 has no concept similar to a default namespace; therefore, you must repeat namespace prefixes again and again.

According to an answer to a similar question, XPath 2.0 does include a notion of "default namespace", and the XSLT page linked above mentions this also in the context of XSLT 2.0.

Unfortunately, all of the built-in XML extensions in PHP are built on top of the libxml2 and libxslt libraries, which support only version 1.0 of XPath and XSLT.

So other than pre-processing the document not to use namespaces, your only option would be to find an XPath 2.0 processor that you could plug in to PHP.

(As an aside, it's worth noting that if you have unprefixed attributes in your XML document, they are not technically in the default namespace, but rather in no namespace at all; see XML Namespaces and Unprefixed Attributes for discussion of this oddity of the Namespace spec.)

PHP SimpleXML xpath does not keep the namespaces when returns data

Yes you're right for your example, not registering the xpath namespace again would create a warning like the following then followed by another warning leading to an empty result:

Warning: SimpleXMLElement::xpath(): Undefined namespace prefix

Warning: SimpleXMLElement::xpath(): xmlXPathEval: evaluation failed

The explanations given in the comments aren't too far off, however they do not offer a good explanation that could help to answer your question.

First of all the documentation is not correct. It's technically not only for the next ::xpath() invocation:

$xmlObject->registerXPathNamespace('ns', 'urn:company');

$fields = $xmlObject->xpath("//ns:field");
$fields = $xmlObject->xpath("//ns:field");
$fields = $xmlObject->xpath("//ns:field");
$fields = $xmlObject->xpath("//ns:field");

This does not give the warning despite it's not only the next, but another further three calls. So the description from the comment is perhaps more fitting that this is related to the object.

One solution would be to extend from SimpleXMLElement and interfere with the namespace registration so that when the xpath query is executed, all result elements could get the namespace prefix registered as well. But that would be much work and won't work when you would access further children of a result.

Additionally you can't assign arrays or objects to store the data within a SimpleXMLElement it would always create new element nodes and then error that objects / arrays are not supported.

One way to circumvent that is to store not inside the SimpleXMLElement but inside the DOM which is accessible via dom_import_simplexml.

So, if you create a DOMXpath you can register namespaces with it. And if you store the instance inside the owner document, you can access the xpath object from any SimpleXMLElement via:

dom_import_simplexml($xml)->ownerDocument-> /** your named field here **/

For this to work, a circular reference is needed. I outlined this in The SimpleXMLElement Magic Wonder World in PHP and an encapsulated variant with easy access could look like:

/**
* Class SimpleXpath
*
* DOMXpath wrapper for SimpleXMLElement
*
* Allows assignment of one DOMXPath instance to the document of a SimpleXMLElement so that all nodes of that
* SimpleXMLElement have access to it.
*
* @link
*/
class SimpleXpath
{
/**
* @var DOMXPath
*/
private $xpath;

/**
* @var SimpleXMLElement
*/
private $xml;

...

/**
* @param SimpleXMLElement $xml
*/
public function __construct(SimpleXMLElement $xml)
{
$doc = dom_import_simplexml($xml)->ownerDocument;
if (!isset($doc->xpath)) {
$doc->xpath = new DOMXPath($doc);
$doc->circref = $doc;
}

$this->xpath = $doc->xpath;
$this->xml = $xml;
}

...

This class constructor takes care that the DOMXPath instance is available and sets the private properties according to the SimpleXMLElement passed in the ctor.

A static creator function allows easy access later:

    /**
* @param SimpleXMLElement $xml
*
* @return SimpleXpath
*/
public static function of(SimpleXMLElement $xml)
{
$self = new self($xml);
return $self;
}

The SimpleXpath now always has the xpath object and the simplexml object when instantiated. So it only needs to wrap all the methods DOMXpath has and convert returned nodes back to simplexml to have this compatible. Here is an example on how to convert a DOMNodeList to an array of SimpleXMLElements of the original class which is the behavior of any SimpleXMLElement::xpath() call:

    ...

/**
* Evaluates the given XPath expression
*
* @param string $expression The XPath expression to execute.
* @param DOMNode $contextnode [optional] <The optional contextnode
*
* @return array
*/
public function query($expression, SimpleXMLElement $contextnode = null)
{
return $this->back($this->xpath->query($expression, dom_import_simplexml($contextnode)));
}

/**
* back to SimpleXML (if applicable)
*
* @param $mixed
*
* @return array
*/
public function back($mixed)
{
if (!$mixed instanceof DOMNodeList) {
return $mixed; // technically not possible with std. SimpleXMLElement
}

$result = [];
$class = get_class($this->xml);
foreach ($mixed as $node) {
$result[] = simplexml_import_dom($node, $class);
}
return $result;
}

...

It's more straight forward for the actual registering of xpath namespaces because it works 1:1:

    ...

/**
* Registers the namespace with the DOMXPath object
*
* @param string $prefix The prefix.
* @param string $namespaceURI The URI of the namespace.
*
* @return bool true on success or false on failure.
*/
public function registerNamespace($prefix, $namespaceURI)
{
return $this->xpath->registerNamespace($prefix, $namespaceURI);
}

...

With these implementations in the chest, all what is left is to extend from SimpleXMLElement and wire it with the newly created SimpleXpath class:

/**
* Class SimpleXpathXMLElement
*/
class SimpleXpathXMLElement extends SimpleXMLElement
{
/**
* Creates a prefix/ns context for the next XPath query
*
* @param string $prefix The namespace prefix to use in the XPath query for the namespace given in ns.
* @param string $ns The namespace to use for the XPath query. This must match a namespace in use by the XML
* document or the XPath query using prefix will not return any results.
*
* @return bool TRUE on success or FALSE on failure.
*/
public function registerXPathNamespace($prefix, $ns)
{
return SimpleXpath::of($this)->registerNamespace($prefix, $ns);
}

/**
* Runs XPath query on XML data
*
* @param string $path An XPath path
*
* @return SimpleXMLElement[] an array of SimpleXMLElement objects or FALSE in case of an error.
*/
public function xpath($path)
{
return SimpleXpath::of($this)->query($path, $this);
}
}

With this modification under the hood, it works transparently with your example if you use that sub-class:

/** @var SimpleXpathXMLElement $xmlObject */
$xmlObject = simplexml_load_string($buffer, 'SimpleXpathXMLElement');

$xmlObject->registerXPathNamespace('ns', 'urn:company');

$fields = $xmlObject->xpath("//ns:field");

foreach ($fields as $field) {

$errors = $field->xpath("//ns:error"); // no issue

var_dump((string)current($errors));

}

This example then runs error free, see here: https://eval.in/398767

The full code is in a gist, too: https://gist.github.com/hakre/1d9e555ac1ebb1fc4ea8

XPath doesn't match attributes without namespace as prefix

The problem is that the attributes in your second example are in the empty namespace. The problem is not the query, but the XML data of your two examples is not equivalent.

See Namespaces in XML 1.0 (Third Edition):

A default namespace declaration applies to all unprefixed element
names within its scope. Default namespace declarations do not apply
directly to attribute names; the interpretation of unprefixed
attributes is determined by the element on which they appear.

XPath behaviour with conflicting XML namespace prefixes

DOMXpath::evaluate()/DOMXpath::query() register the namespace definitions of the current context node on top of the manually registered ones. Basically the document will override the namespace for a prefix. The third argument (since PHP >= 5.3.3) can disable the automatic registration:

$document = new DOMDocument();
$document->loadXML($xml);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('en', 'http://english.com/');

// the xmlns:en from the document element overrides the registration
var_dump($xpath->evaluate('normalize-space(//en:movie)'));
// automatic registration disabled - works correctly
var_dump($xpath->evaluate('normalize-space(//en:movie)', NULL, FALSE));

Output:

string(13) "The Godfather"
string(26) "The Fellowship of the Ring"

SimpleXML access nodes with namespace and subnodes without namespace

The argument to ->children() is always a namespace identifier or local prefix, never the tag name. If these elements were in "no namespace", you would access them with ->children('').

However, the elements with no prefix in this document do not have no namespace - they are in the default namespace, in this case urn:ehd/go/001 (as defined by xmlns="urn:ehd/go/001").

If you use the full namespace identifiers rather than the prefixes (which is also less likely to break if the feed changes), you should be able to access these easily:

$xml = simplexml_load_file($file) or die("Failed to load");   
$ehd = $xml->children('urn:ehd/001')->body;
$gnr_liste = $ehd->children('urn:ehd/go/001')->gnr_liste;
foreach ( $gnr_liste->gnr as $gnr ) {
simplexml_dump($gnr);
}

You might want to give your own names to the namespaces so you don't have to use the full URIs, but aren't dependent on the prefixes the XML is generated with; a common approach is to define constants:

const XMLNS_EHD_MAIN = 'urn:ehd/001';
const XMLNS_EHD_GNR = 'urn:ehd/go/001';

$xml = simplexml_load_file($file) or die("Failed to load");
$ehd = $xml->children(XMLNS_EHD_MAIN)->body;
$gnr_liste = $ehd->children(XMLNS_EHD_GNR)->gnr_liste;
foreach ( $gnr_liste->gnr as $gnr ) {
simplexml_dump($gnr);
}

How does XPath deal with XML namespaces?

XPath 1.0/2.0

Defining namespaces in XPath (recommended)

XPath itself doesn't have a way to bind a namespace prefix with a namespace. Such facilities are provided by the hosting library.

It is recommended that you use those facilities and define namespace prefixes that can then be used to qualify XML element and attribute names as necessary.


Here are some of the various mechanisms which XPath hosts provide for specifying namespace prefix bindings to namespace URIs.

(OP's original XPath, /IntuitResponse/QueryResponse/Bill/Id, has been elided to /IntuitResponse/QueryResponse.)

C#:

XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
XmlNodeList nodes = el.SelectNodes(@"/i:IntuitResponse/i:QueryResponse", nsmgr);

Google Docs:

Unfortunately, IMPORTXML() does not provide a namespace prefix binding mechanism. See next section, Defeating namespaces in XPath, for how to use local-name() as a work-around.

Java (SAX):

NamespaceSupport support = new NamespaceSupport();
support.pushContext();
support.declarePrefix("i", "http://schema.intuit.com/finance/v3");

Java (XPath):

xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
switch (prefix) {
case "i": return "http://schema.intuit.com/finance/v3";
// ...
}
});
  • Remember to call
    DocumentBuilderFactory.setNamespaceAware(true).
  • See also:
    Java XPath: Queries with default namespace xmlns

JavaScript:

See Implementing a User Defined Namespace Resolver:

function nsResolver(prefix) {
var ns = {
'i' : 'http://schema.intuit.com/finance/v3'
};
return ns[prefix] || null;
}
document.evaluate( '/i:IntuitResponse/i:QueryResponse',
document, nsResolver, XPathResult.ANY_TYPE,
null );

Note that if the default namespace has an associated namespace prefix defined, using the nsResolver() returned by Document.createNSResolver() can obviate the need for a customer nsResolver().

Perl (LibXML):

my $xc = XML::LibXML::XPathContext->new($doc);
$xc->registerNs('i', 'http://schema.intuit.com/finance/v3');
my @nodes = $xc->findnodes('/i:IntuitResponse/i:QueryResponse');

Python (lxml):

from lxml import etree
f = StringIO('<IntuitResponse>...</IntuitResponse>')
doc = etree.parse(f)
r = doc.xpath('/i:IntuitResponse/i:QueryResponse',
namespaces={'i':'http://schema.intuit.com/finance/v3'})

Python (ElementTree):

namespaces = {'i': 'http://schema.intuit.com/finance/v3'}
root.findall('/i:IntuitResponse/i:QueryResponse', namespaces)

Python (Scrapy):

response.selector.register_namespace('i', 'http://schema.intuit.com/finance/v3')
response.xpath('/i:IntuitResponse/i:QueryResponse').getall()

PhP:

Adapted from @Tomalak's answer using DOMDocument:

$result = new DOMDocument();
$result->loadXML($xml);

$xpath = new DOMXpath($result);
$xpath->registerNamespace("i", "http://schema.intuit.com/finance/v3");

$result = $xpath->query("/i:IntuitResponse/i:QueryResponse");

See also @IMSoP's canonical Q/A on PHP SimpleXML namespaces.

Ruby (Nokogiri):

puts doc.xpath('/i:IntuitResponse/i:QueryResponse',
'i' => "http://schema.intuit.com/finance/v3")

Note that Nokogiri supports removal of namespaces,

doc.remove_namespaces!

but see the below warnings discouraging the defeating of XML namespaces.

VBA:

xmlNS = "xmlns:i='http://schema.intuit.com/finance/v3'"
doc.setProperty "SelectionNamespaces", xmlNS
Set queryResponseElement =doc.SelectSingleNode("/i:IntuitResponse/i:QueryResponse")

VB.NET:

xmlDoc = New XmlDocument()
xmlDoc.Load("file.xml")
nsmgr = New XmlNamespaceManager(New XmlNameTable())
nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
nodes = xmlDoc.DocumentElement.SelectNodes("/i:IntuitResponse/i:QueryResponse",
nsmgr)

SoapUI (doc):

declare namespace i='http://schema.intuit.com/finance/v3';
/i:IntuitResponse/i:QueryResponse

xmlstarlet:

-N i="http://schema.intuit.com/finance/v3"

XSLT:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:i="http://schema.intuit.com/finance/v3">
...

Once you've declared a namespace prefix, your XPath can be written to use it:

/i:IntuitResponse/i:QueryResponse


Defeating namespaces in XPath (not recommended)

An alternative is to write predicates that test against local-name():

/*[local-name()='IntuitResponse']/*[local-name()='QueryResponse']

Or, in XPath 2.0:

/*:IntuitResponse/*:QueryResponse

Skirting namespaces in this manner works but is not recommended because it

  • Under-specifies the full element/attribute name.

  • Fails to differentiate between element/attribute names in different
    namespaces (the very purpose of namespaces). Note that this concern could be addressed by adding an additional predicate to check the namespace URI explicitly:

     /*[    namespace-uri()='http://schema.intuit.com/finance/v3' 
    and local-name()='IntuitResponse']
    /*[ namespace-uri()='http://schema.intuit.com/finance/v3'
    and local-name()='QueryResponse']

    Thanks to Daniel Haley for the namespace-uri() note.

  • Is excessively verbose.

XPath 3.0/3.1

Libraries and tools that support modern XPath 3.0/3.1 allow the specification of a namespace URI directly in an XPath expression:

/Q{http://schema.intuit.com/finance/v3}IntuitResponse/Q{http://schema.intuit.com/finance/v3}QueryResponse

While Q{http://schema.intuit.com/finance/v3} is much more verbose than using an XML namespace prefix, it has the advantage of being independent of the namespace prefix binding mechanism of the hosting library. The Q{} notation is known as Clark Notation after its originator, James Clark. The W3C XPath 3.1 EBNF grammar calls it a BracedURILiteral.

Thanks to Michael Kay for the suggestion to cover XPath 3.0/3.1's BracedURILiteral.

PHP - SimpleXMLElement not parsing correctly with namespaces

Do not fetch the namespaces from the document. Define them in you application. The namespaces are the values of the xmlns/xmlns:* attributes. The xmlns attribute is a default namespace. So the tag entry is actually {http://www.w3.org/2005/Atom}:entry.

Namespaces have to be unique. To avoid conflicts most people use URLs. (It is not likely that other people will use your domains to define their namespaces.) The downside of this that the namespace are large strings with special characters. This is solved by using the namespaces prefixes as aliases.

Xpath does not have a default namespace. You need to register a prefix for each namespace you like to use. The Xpath engine will resolve the prefix to the actual namespace and compare it with the resolved namespace of the node.

$xml = new SimpleXMLElement($xmlstr);
$namespaces = [
'a' => 'http://www.w3.org/2005/Atom',
'm' => 'http://schemas.microsoft.com/ado/2007/08/dataservices/metadata',
'd' => 'http://schemas.microsoft.com/ado/2007/08/dataservices',
'o' => 'https://exmple.com/odata/'
];
foreach ($namespaces as $prefix => $namespace) {
$xml->registerXPathNamespace($prefix, $namespace);
}

$id = $xml->xpath('/a:entry/a:content/m:properties/d:id');
var_dump($id);

Output:

array(1) {
[0]=>
object(SimpleXMLElement)#2 (0) {
}
}

You will have to register the Xpath namespaces on each SimpleXMLElement again.

This is more convenient in DOM. DOMXpath::evaluate() executes Xpath expressions and can return node lists or scalars, depending on the expression.

$document = new DOMDocument($xmlstr);
$document->loadXml($xmlstr);
$xpath = new DOMXpath($document);
$namespaces = [
'a' => 'http://www.w3.org/2005/Atom',
'm' => 'http://schemas.microsoft.com/ado/2007/08/dataservices/metadata',
'd' => 'http://schemas.microsoft.com/ado/2007/08/dataservices',
'o' => 'https://exmple.com/odata/'
];
foreach ($namespaces as $prefix => $namespace) {
$xpath->registerNamespace($prefix, $namespace);
}

$id = $xpath->evaluate('string(/a:entry/a:content/m:properties/d:id)');
var_dump($id);

Output:

string(3) "989"


Related Topics



Leave a reply



Submit