Using Regex to Filter Attributes in Xpath with PHP

Using regex to filter attributes in xpath with php

An attribute is still a complex element according to DOM (has a namespace etc.). Use:

//table[php:function('preg_match', '/post\d+/', string(@id))]

Now, we need a boolean return, so:

function booleanPregMatch($match,$string){
return preg_match($match,$string)>0;
}
$xpath->registerPHPFunctions();
foreach($xpath->query("//table[@id and php:function('booleanPregMatch', '/post\d+/', string(@id))]") as $key => $row){
echo $row->ownerDocument->saveXML($row);
}

BTW: for more complex issues, you can of course sneakily check what's happening with this:

//table[php:function('var_dump',@id)]

It's a shame we don't have XPATH 2.0 functions available, but if you can handle this requirement with a more unreliable starts-with, I'd always prefer that over importing PHP functions.

xpath query with regex

//div[starts-with(@id, "abc_")]

XPath with regex match on an attribute value

I'm trying to get the total number of
event nodes that contain the text '
doubles ' in the value of the
description attribute.

matches() is a standard XPath 2.0 function. It is not available in XPath 1.0.

You can use:

count(/*/*/event[contains(@description, ' doubles ')])

To verify this, here is a small XSLT transformation which just outputs the result of evaluating the above XPath expression on the provided XML document:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>

<xsl:template match="/">
<xsl:value-of select=
"count(/*/*/event[contains(@description, ' doubles ')])"/>
</xsl:template>
</xsl:stylesheet>

when this transformation is applied on the provided XML document:

<game id="2009/05/02/arimlb-milmlb-1" pk="244539">
<team id="109" name="Arizona" home_team="false">
<event number="9" inning="1" description="Felipe Lopez doubles to left fielder Chris Duffy. "/>
<event number="15" inning="1" description="Augie Ojeda flies out to center fielder Mike Cameron. "/>
<event number="23" inning="1" description="Chad Tracy doubles to right fielder Joe Sanchez. "/>
<event number="52" inning="2" description="Mark Reynolds lines out to left fielder Chris Duffy. "/>
<!-- more data here -->
</team>
</game>

the wanted, correct result is produced:

2

Get dom elements with particular string in the attribute

XPath 2.0 has a matches() function that lets you use regular expressions. In 1.0 though, which is what DOMXPath uses, your best bet would probably be something like:

//*[contains(@*,"{{") and contains(@*,"}}")]

Note this would still match cases where }} precedes {{ or where there's nothing between those two, so you'd probably want to double-check the results once you get them.

Filter xml element with regex to return matching element

Better take the parser road:

<?php
$xml = simplexml_load_string($html);
$elements = $xml->xpath("//outfit[@default=1]");
// to get the bag url
echo $elements[0]->bag["url"];
?>

This way, you can analyze your XML better.

Xpath and regex for autocompletion filter

UPDATE:
As the original problem was changed adding the requirement to recognize the word "Spain" not only in all possible capitalizations but also including accented characters, I have updated the solution below so that now "Spain" with â and/or ïÏ is correctly recognized.

Here is a more generic solution than that of @Alejandro:

If we want to select all elements, whose name attribute contains the word "Spain" in any capitalization and if the possible word delimiters are all non-alphabetic characters, then

This XPath expression:

/*/*[contains(
concat(' ',
translate(translate(@name,
translate(@name, $vAlpha, ''),
' '),
$vUpper,
$vLower),
' '
),
' spain '
)
]

when applied on this XML document:

<elements>
<element id="1" name="france" />
<element id="2" name="usa" />
<element id="3" name="Spaïn" />
<element id="4" name="france with spâin and africa" />
<element id="5" name="-Spain!" />
<element id="6" name="spain and africa" />
<element id="7" name="italie and Spain." />
</elements>

selects the following elements:

<element id="3" name="Spaïn"/>
<element id="4" name="france with spâin and africa"/>
<element id="5" name="-Spain!"/>
<element id="6" name="spain and africa"/>
<element id="7" name="italie and Spain."/>

In the above XPath expression $vLower, $vUpper must be substituted with (respectively):

'aaabcdefghiiijklmnopqrstuvwxyz'

and

'âÂABCDEFGHïÏIJKLMNOPQRSTUVWXYZ'

$vAlpha must be substituted by the concatenation of $vLower and $vUpper .



Related Topics



Leave a reply



Submit