Domxpath - Get Href Attribute and Text Value of an a Element

DOMXpath - Get href attribute and text value of an a element

Fetch

//td[@class='name']/a

and then pluck the text with nodeValue and the attribute with getAttribute('href').

Apart from that, you can combine Xpath queries with the Union Operator | so you can use

//td[@class='name']/a/@href|//td[@class='name']

as well.

Get href value with DOMDocument in PHP

h1//a[@href=""] is looking for an a element with an href attribute with an empty string as the value, whereas your href attribute contains something other than the empty string as the value.


If that's the entire document, then you could use the expression //a.

Otherwise, h1//a should work as well.

If you require the a element to have an href attribute with any kind of value, you could use h1//a[@href].

If the h1 is not at the root of the document, you might want to use //h1 instead. So the last example would become //h1//a[@href].

Getting Text Value of Data-Attribute Link in DOM XPath

Quite often I find it easiest to "inspect" the element I wish to target using the developer tools in Chrome from where it is possible to copy the XPath expression that targets that particular node. This doesn't always return the most useful XPath expression but it is usually a good starting point - in this case I tweaked the returned query and added in the classname.

Hope it helps

$term='dog show';
$url=sprintf('https://en.wikipedia.org/w/index.php?search=%s&title=Special:Search&fulltext=Search', urlencode( $term ) );

printf( '<a href="%s" target="_blank">%s</a>', $url, $url );

libxml_use_internal_errors(true);
$dom=new DOMDocument;
$dom->recover=true;
$dom->formatOutput=true;
$dom->preserveWhiteSpace=true;
$dom->strictErrorChecking=false;

$dom->loadHTMLFile( $url );
$xp=new DOMXPath( $dom );

/* possibly the important bit */
$query='//*[@id="mw-content-text"]/div/ul/li/div[@class="mw-search-result-heading"]/a';

$col=$xp->query( $query );

$html=array();

if( $col && $col->length > 0 ){
foreach( $col as $node ){
$html[]=array(
'title'=>$node->nodeValue,
'href'=>$node->getAttribute('href')
);
}
}

printf('<pre>%s</pre>',print_r($html,true));

Will output:

https://en.wikipedia.org/w/index.php?search=dog+show&title=Special:Search&fulltext=Search
Array(
[0] => Array
(
[title] => Dog show
[href] => /wiki/Dog_show
)

[1] => Array
(
[title] => Show dog
[href] => /wiki/Show_dog
)

[2] => Array
(
[title] => Westminster Kennel Club Dog Show
[href] => /wiki/Westminster_Kennel_Club_Dog_Show
)

[3] => Array
(
[title] => Dog Eat Dog (U.S. game show)
[href] => /wiki/Dog_Eat_Dog_(U.S._game_show)
)

.......... etc

XPath get attribute value in PHP

XPath can do the job of getting the value attribute with $xpath->query("//input[@name='text1']/@value");. Then you can iterate over the node list of attribute nodes and access the $value property of each attribute node.

PHP Xpath : get all href values that contain needle

Not sure I understand the question correctly, but the second XPath expression already does what you are describing. It does not match against the text node of the A element, but the href attribute:

$html = <<< HTML
<ul>
<li>
<a href="http://example.com/page?foo=bar">Description</a>
</li>
<li>
<a href="http://example.com/page?lang=de">Description</a>
</li>
</ul>
HTML;

$xml = simplexml_load_string($html);
$list = $xml->xpath("//a[contains(@href,'foo')]");

Outputs:

array(1) {
[0]=>
object(SimpleXMLElement)#2 (2) {
["@attributes"]=>
array(1) {
["href"]=>
string(31) "http://example.com/page?foo=bar"
}
[0]=>
string(11) "Description"
}
}

As you can see, the returned NodeList contains only the A element with href containing foo (which I understand is what you are looking for). It contans the entire element, because the XPath translates to Fetch all A elements with href attribute containing foo. You would then access the attribute with

echo $list[0]['href'] // gives "http://example.com/page?foo=bar"

If you only want to return the attribute itself, you'd have to do

//a[contains(@href,'foo')]/@href

Note that in SimpleXml, this would return a SimpleXml element though:

array(1) {
[0]=>
object(SimpleXMLElement)#3 (1) {
["@attributes"]=>
array(1) {
["href"]=>
string(31) "http://example.com/page?foo=bar"
}
}
}

but you can output the URL now by

echo $list[0] // gives "http://example.com/page?foo=bar"

Get href attribute of an element of a web page using Selenium

To print the value of the href attribute you can use either of the following locator strategies:

  • Using cssSelector:

    System.out.println(wd.findElement(By.cssSelector("a.a-link-normal#vvp-product-details-modal--product-title")).getAttribute("href"));
  • Using xpath:

    System.out.println(wd.findElement(By.xpath("//a[@class='a-link-normal' and @id='vvp-product-details-modal--product-title']")).getAttribute("href"));

Ideally, to extract the the value of the href attribute, you have to induce WebDriverWait for the visibilityOfElementLocated() and you can use either of the following locator strategies:

  • Using cssSelector and getText():

    System.out.println(new WebDriverWait(driver, 20).until(ExpectedConditions.visibilityOfElementLocated(By.cssSelector("a.a-link-normal#vvp-product-details-modal--product-title"))).getAttribute("href"));
  • Using xpath and getAttribute("innerHTML"):

    System.out.println(new WebDriverWait(driver, 20).until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//a[@class='a-link-normal' and @id='vvp-product-details-modal--product-title']"))).getAttribute("href"));

PHP DOMXPath extract href of anchor inside a td

With the structure you posted, the following outputs the href-value:

<?php
$dom = new DOMDocument('1.0');
$dom->loadHTMLFile('input.html');

$xpath = new DOMXPath($dom);

$query = '//*[@id="main_content"]/table/tr/td/table[3]/tr[2]/td/table/tr[position() >= 3]/td[2]/a';

$nodes = $xpath->query($query);

foreach ($nodes as $node) {
/** @var $node DOMElement */
var_dump(
$node->getAttribute('href'), // the href-attribute value
$node->nodeValue // the inner text
);
}


Related Topics



Leave a reply



Submit