Using Regex to Filter Attributes in Xpath with PHP

Using regex to filter attributes in xpath with php

An attribute is still a complex element according to DOM (has a namespace etc.). Use:

//table[php:function('preg_match', '/post\d+/', string(@id))]

Now, we need a boolean return, so:

function booleanPregMatch($match,$string){
return preg_match($match,$string)>0;
foreach($xpath->query("//table[@id and php:function('booleanPregMatch', '/post\d+/', string(@id))]") as $key => $row){
echo $row->ownerDocument->saveXML($row);

BTW: for more complex issues, you can of course sneakily check what's happening with this:


It's a shame we don't have XPATH 2.0 functions available, but if you can handle this requirement with a more unreliable starts-with, I'd always prefer that over importing PHP functions.

xpath query with regex

//div[starts-with(@id, "abc_")]

XPath with regex match on an attribute value

I'm trying to get the total number of
event nodes that contain the text '
doubles ' in the value of the
description attribute.

matches() is a standard XPath 2.0 function. It is not available in XPath 1.0.

You can use:

count(/*/*/event[contains(@description, ' doubles ')])

To verify this, here is a small XSLT transformation which just outputs the result of evaluating the above XPath expression on the provided XML document:

<xsl:stylesheet version="1.0"
<xsl:output method="text"/>

<xsl:template match="/">
<xsl:value-of select=
"count(/*/*/event[contains(@description, ' doubles ')])"/>

when this transformation is applied on the provided XML document:

<game id="2009/05/02/arimlb-milmlb-1" pk="244539">
<team id="109" name="Arizona" home_team="false">
<event number="9" inning="1" description="Felipe Lopez doubles to left fielder Chris Duffy. "/>
<event number="15" inning="1" description="Augie Ojeda flies out to center fielder Mike Cameron. "/>
<event number="23" inning="1" description="Chad Tracy doubles to right fielder Joe Sanchez. "/>
<event number="52" inning="2" description="Mark Reynolds lines out to left fielder Chris Duffy. "/>
<!-- more data here -->

the wanted, correct result is produced:


Get dom elements with particular string in the attribute

XPath 2.0 has a matches() function that lets you use regular expressions. In 1.0 though, which is what DOMXPath uses, your best bet would probably be something like:

//*[contains(@*,"{{") and contains(@*,"}}")]

Note this would still match cases where }} precedes {{ or where there's nothing between those two, so you'd probably want to double-check the results once you get them.

Filter xml element with regex to return matching element

Better take the parser road:

$xml = simplexml_load_string($html);
$elements = $xml->xpath("//outfit[@default=1]");
// to get the bag url
echo $elements[0]->bag["url"];

This way, you can analyze your XML better.

Xpath and regex for autocompletion filter

As the original problem was changed adding the requirement to recognize the word "Spain" not only in all possible capitalizations but also including accented characters, I have updated the solution below so that now "Spain" with â and/or ïÏ is correctly recognized.

Here is a more generic solution than that of @Alejandro:

If we want to select all elements, whose name attribute contains the word "Spain" in any capitalization and if the possible word delimiters are all non-alphabetic characters, then

This XPath expression:

concat(' ',
translate(@name, $vAlpha, ''),
' '),
' '
' spain '

when applied on this XML document:

<element id="1" name="france" />
<element id="2" name="usa" />
<element id="3" name="Spaïn" />
<element id="4" name="france with spâin and africa" />
<element id="5" name="-Spain!" />
<element id="6" name="spain and africa" />
<element id="7" name="italie and Spain." />

selects the following elements:

<element id="3" name="Spaïn"/>
<element id="4" name="france with spâin and africa"/>
<element id="5" name="-Spain!"/>
<element id="6" name="spain and africa"/>
<element id="7" name="italie and Spain."/>

In the above XPath expression $vLower, $vUpper must be substituted with (respectively):




$vAlpha must be substituted by the concatenation of $vLower and $vUpper .

Related Topics

Leave a reply
