Xpath Query with PHP

How do I do an XPath query on a DOMNode?

Pass the node as the second argument to DOMXPath::query

contextnode: The optional contextnode can be specified for doing relative XPath queries. By default, the queries are relative to the root element.

Example:

foreach ($nodes as $node) {
foreach ($x_path->query('h3|a', $node) as $child) {
echo $child->nodeValue, PHP_EOL;
}
}

This uses the UNION operator for a result of

Get me 1
and me too 1
Get me 2
and me too 1

If you don't need any complex querying, you can also do

foreach ($nodes as $node) {
foreach ($node->getElementsByTagName('a') as $a) {
echo $a->nodeValue, PHP_EOL;
}
}

Or even by iterating the child nodes (note that this includes all the text nodes)

foreach ($nodes as $node) {
foreach ($node->childNodes as $child) {
echo $child->nodeName, PHP_EOL;
}
}

However, all of that is unneeded since you can fetch these nodes directly:

$nodes= $x_path->query("/html/body//div[@class='listing']/div[last()]");

foreach ($nodes as $i => $node) {
echo $i, $node->nodeValue, PHP_EOL;
}

will give you two nodes in the last div child of all the divs with a class attribute value of listing and output the combined text node values, including whitespace

0
Get me 1
and me too 1

1
Get me 2
and me too 1

Likewise, the following

"//div[@class='listing']/div[last()]/node()[name() = 'h3' or name() = 'a']"

will give you the four child H3 and A nodes and output

0Get me 1
1and me too 1
2Get me 2
3and me too 1

If you need to differentiate these by name while iterating over them, you can do

foreach ($nodes as $i => $node) {
echo $i, $node->nodeName, $node->nodeValue, PHP_EOL;
}

which will then give

0h3Get me 1
1aand me too 1
2h3Get me 2
3aand me too 1

How to query a xml file using xpath (php) ?

Building up the Xpath expression:

  • Fetch any subject element
    //subject
  • ... with a child element relation
    //subject[relation]
  • ... that has a role attribute with the given text
    //subject[relation/@role="ITSupporter"]
  • ... and get the @id attribute of subject
    //subject[relation/@role="ITSupporter"]/@id

Additionally the source could be cleaned up. PHP arrays can use the $array[] syntax to push new elements into them.

Put together:

$xml = <<<'XML'
<subject id="Tom">
<relation unit="ITSupport" role="ITSupporter" />
</subject>
XML;

$role = 'ITSupporter';

$document = new DOMDocument();
$document->loadXML($xml);
$xpath = new DOMXpath($document);

$ids = [];
foreach ($xpath->evaluate("//subject[relation/@role='".$role."']/@id") as $idAttribute) {
$ids[] = $idAttribute->value;
}
var_dump($ids);

Output:

array(1) { 
[0]=>
string(3) "Tom"
}

If you expect only a single result you can cast the it in Xpath:

$id = $xpath->evaluate(
"string(//subject[relation/@role='".$role."']/@id)"
);
var_dump($id);

Output:

string(3) "Tom"

XML Namespaces

Looking at the example posted in the comment your XML uses the namespace http://cpee.org/ns/organisation/1.0 without a prefix. The XML parser will resolve it so you can read the nodes as {http://cpee.org/ns/organisation/1.0}subject. Here are 3 examples that all resolve to this:

  • <subject xmlns="http://cpee.org/ns/organisation/1.0"/>
  • <cpee:subject xmlns:cpee="http://cpee.org/ns/organisation/1.0"/>
  • <c:subject xmlns:c="http://cpee.org/ns/organisation/1.0"/>

The same has to happen for the Xpath expression. However Xpath does not have
a default namespace. You need to register an use an prefix of your choosing. This
allows the Xpath engine to resolve something like //org:subject to //{http://cpee.org/ns/organisation/1.0}subject.

The PHP does not need to change much:

$xml = <<<'XML'
<subject id="Tom" xmlns="http://cpee.org/ns/organisation/1.0">
<relation unit="ITSupport" role="ITSupporter" />
</subject>
XML;

$role = 'ITSupporter';

$document = new DOMDocument();
$document->loadXML($xml);
$xpath = new DOMXpath($document);
// register a prefix for the namespace
$xpath->registerNamespace('org', 'http://cpee.org/ns/organisation/1.0');

$ids = [];
// address the elements using the registered prefix
$idAttributes = $xpath->evaluate("//org:subject[org:relation/@role='".$role."']/@id");
foreach ($idAttributes as $idAttribute) {
$ids[] = $idAttribute->value;
}
var_dump($ids);

Concatenating an xPath query in PHP

You should use operator | to merge more than one different paths into one.
this is valid:

$titles = $xpath->query('//*[@id="test"]/div/div/h1 | //*[@id="test1"]/div/div/h1');

XPath query with PHP

Try this:

//lemonade[@id="1"]/price

or

//lemonade[@supplier="mother"]/price

Without the "@" it looks for child elements with that name instead of attributes.

PHP + XPath Query get Child Nodes and their values

Problem is a record as a node is just part of the node hierarchy. What you're getting is all records, however you also want to descend and get the data from the record's child nodes. A very case specific example is :

<?php     
$xmldoc = new DOMDocument();
$xmldoc->load('record.xml');
$xpathvar = new Domxpath($xmldoc);
$res = $xpathvar->query('//record');
foreach($res as $data){
$narr = [];
foreach ($data->childNodes as $cnode) {
$narr[$cnode->nodeName] = $cnode->nodeValue;
}
$arr[] = $narr;

}
print_r($arr);

Should output something like:

Array
(
[0] => Array
(
[name] => john
[gender] => male
[subject] => mathematics, english, science
)

[1] => Array
(
[name] => jamie
[gender] => female
[subject] => mathematics, science
)

[2] => Array
(
[name] => jack
[gender] => male
[subject] => social-science, english
)

)

NOTE: This solution is very case specific and would probably break if you have additional nested hierarchies (which I recommend you do in order to store the multiple subjects).

XPath query in an XPath query

$xpath->query($expr) is executed on the whole document each call within the loop because you don't pass the document node the XPath query should be evaluated in relatively.

With the polymorphic method DOMNodeList query(string $expr, DOMNode $node) you can do a sub query relative to the given $node.
This method produces the desired result only if you use a relative XPath $expr (without leading /).
To retrieve the string from each DOMNode/TextNode finally use the queries as follows:

$list[$count]['url'] = $xpath->query("h3/div/a[@class='selink']/@href", $value)->item(0)->value;
$list[$count]['title'] = $xpath->query("h3/div/a/span[@class='titel']/text()", $value)->item(0)->wholeText;

I edited your CodePad code here.

regards,
Max

Using XPath to extract XML in PHP

DOMXPath::query's second parameter is the context node. Just pass the DOMNode instance you have previously "found" and your query runs "relative" to that node. E.g.

<?php
$doc = new DOMDocument;
$doc->loadxml( data() );

$xpath = new DOMXPath($doc);
$nset = $xpath->query('/root/level[@name="level1"]');
if ( $nset->length < 1 ) {
die('....no such element');
}
else {
$elLevel = $nset->item(0);

foreach( $xpath->query('c', $elLevel) as $elC) {
echo $elC->nodeValue, "\r\n";
}
}

function data() {
return <<< eox
<root>
<level name="level1">
<c>C1</c>
<a>A</a>
<c>C2</c>
<b>B</b>
<c>C3</c>
</level>
<level name="level2">
<!-- Some more children <level> -->
</level>
</root>
eox;
}

But unless you have to perform multiple separate (possible complex) subsequent queries, this is most likely not necessary

<?php
$doc = new DOMDocument;
$doc->loadxml( data() );

$xpath = new DOMXPath($doc);
foreach( $xpath->query('/root/level[@name="level1"]/c') as $c ) {
echo $c->nodeValue, "\r\n";
}

function data() {
return <<< eox
<root>
<level name="level1">
<c>C1</c>
<a>A</a>
<c>C2</c>
<b>B</b>
<c>C3</c>
</level>
<level name="level2">
<c>Ahh</c>
<a>ouch</a>
<c>no</c>
<b>wrxl</b>
</level>
</root>
eox;
}

has the same output using just one query.

Xpath query for HTML table within XML in PHP DOMDocument

You wrote that you wanted the length of the result set of the following query:

$queryResult = $xpathvar->query('//item/title');

I assume that $xpathvar here is of type DOMXPath. If so, it has a length property as described here. Instead of using foreach, simply use:

$length = $xpathvar->query('//item/title')->length;

Now I want to pick the text node values for //channel/item/title

Which you can get with the expression //channel/item/title/text().

and href value for //channel/item/description/table/tr/td[1]/a[1] (with a text node value = "[link]")

Your expression here selects any tr, the first td under that, then the first a. But the first a does not have a value of "[link]" in your source. If you want that, though, you can use:

//channel/item/description/table/tr/td[1]/a[1]/@href

but it looks like you rather want:

//channel/item/description/table/tr/td/a[. = "[link]"][1]/@href

which finds the first a element in the tree that has the value (text node) that is "[link]".

Above in 2nd case, I am looking for the value of 2nd a (with a text node value = "[link]"), inside 2nd td inside tr, table, description, item, channel.

Not sure if this was a separate question or meant to explain the previous one. Regardless, the answer the same as in the previous one, unless you explicitly want to search for 2nd a etc (i.e., search by position), in which case you can use numeric predicates.


Note: you start most of your expressions with //expr, which essentially means: search the whole tree at any depth for the expression expr. This is potentially expensive and if all you need is a (relative) root node for which you know the starting point or expression, it is better, and far more performant, to use a direct path. In your case, you can replace //channel for /*/channel (because it is the first under the root element).

PHP DomXpath xpath query of Child Node

Assuming there is only 1 matching image, you can use XPaths evaluate() and string() in the XPath expression to extract the value in one go...

$images = $xpath->evaluate("string(//img[contains(@class,'dx-smart-widget-report-cover_113_20')]/@src)", $element);
echo "<strong>Image: </strong>".$images. "<br />";


Related Topics



Leave a reply



Submit