PHP Extract Link from <A> Tag

PHP extract link from a tag

This is very easy to do using SimpleXML:

$a = new SimpleXMLElement('<a href="www.something.com">Click here</a>');
echo $a['href']; // will echo www.something.com

Regex to extract first link on page inside another tag

Maybe, if you only like to extract the first h4, then you might want to modify it to,

(?i)<h4><a [^>]*\bhref\s*=\s*"\s*([^"]*)\s*".*

with an i flag.

$re = '/(?i)<h4><a [^>]*\bhref\s*=\s*"\s*([^"]*)\s*".*/s';
$str = '<h4><a href="somelinkhere" class="search_result_title" title="sometitle" data-followable="true">Some Text Here</a></h4><h4><a href="somelinkhere" class="search_result_title" title="sometitle" data-followable="true">Some Text Here</a></h4>
';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

foreach ($matches as $match) {
print($match[1]);
}

Output

somelinkhere

If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


Extract link attributes from string of HTML

Instead of crafting long complicated regex, do it in steps

$str = '<a href="http://stackoverflow.com/"> Stack Overflow</a>';
$str = preg_replace("/.*<a\s+href=\"/","",$str);
print preg_replace("/\">.*/","",$str);

one way of "non regex", using explode

$str = '<a href="http://stackoverflow.com/"> Stack Overflow</a>';
$s = explode('href="',$str);
$t = explode('">',$s[1]);
print $t[0];

how to extract links and titles from a .html page?

Thank you everyone, I GOT IT!

The final code:

$html = file_get_contents('bookmarks.html');
//Create a new DOM document
$dom = new DOMDocument;

//Parse the HTML. The @ is used to suppress any parsing errors
//that will be thrown if the $html string isn't valid XHTML.
@$dom->loadHTML($html);

//Get all links. You could also use any other tag name here,
//like 'img' or 'table', to extract other tags.
$links = $dom->getElementsByTagName('a');

//Iterate over the extracted links and display their URLs
foreach ($links as $link){
//Extract and show the "href" attribute.
echo $link->nodeValue;
echo $link->getAttribute('href'), '<br>';
}

This shows you the anchor text assigned and the href for all links in a .html file.

Again, thanks a lot.

How to extract id and url from an anchor tag in this code using PHP?

Your question is a little not clear, so if I understand correctly, you are able to extract the anchor tag (<a>) content easily but if the <a tag contains href and id, then you are thinking it won't work. Plus, as per your post title, you want to extract href and id attribute's values as well and they may or may not be there. And in fact any of them could be missing.

In this case, you can use this regex,

<(a)(?:\s+href=(['"])(?<href>[^'"]*)\2\s*)?(?:\s+id=(['"])(?<id>[^'"]*)\4\s*)?>(.+?)<\/\1>

Explanation:

  • < --> start of tag
  • (a) --> Expects the tag name to be `a' only and captures it in group 1 for matching it by back referencing at closing end
  • (?:\s+href=(['"])(?<href>[^'"]*)\2\s*)? --> This part matches href attribute and captures its value in href named group which is optional
  • (?:id=(['"])(?<id>[^'"]*)\4\s*)? --> This part matches id attribute and captures value in id named group which is optional too
  • > end of <a tag
  • (.+?) --> Captures <a tags inner text
  • <\/\1> --> Matches ending tag for <a through back referencing through \1

This will still match group 1 reference which will be a according to above regex, as well as will capture href and id attribute's values as well, both being optional.

Here is a demo

Let me know if this is what you wanted. In case of any queries, do let me know.

Regex How to extract link from HTML with specific path

This will work if I don't misunderstood your question.

$html = '<a href="https://www.website.com/n/?confirm.php" ></a>';
preg_match_all('/href="([^\s"]+)/', $html, $match);
print '<pre>';
print_r($match);
print '</pre>';
print $match[1][0];

Edited: As per comment, you didn't provided us the specific url that's why I just post a generic answer to capture href. See my below answer now. Used regex will be found here https://regex101.com/r/pnfz7E/1

$re = '/<a href="([^"]*?\/n\/\?confirm\.php)">.*?<\/a>/m';
$str = '<a href="https://www.website.com/n/?noconfirm.php">SSD</a>
<div>How are you</div>
<a href="https://www.website.com/n/?confirm.php">HDD</a>
<h2>Being Sunny</h2>
<a href="https://www.ltmgtfu.com/n/?noconfirm.php">MSD</a>
<div>How are you</div>
<a href="https://www.website.com/n/?confirm.php"></a>
<h2>Being Sunny</h2>
<a href="https://www.google.com/n/?noconfirm.php">GSD</a>
<div>How are you</div>
<a href="https://www.website.com/n/?confirm.php">LSD</a>
<h2>Being Sunny</h2>';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the entire match result
print '<pre>';
print_r($matches);
print '</pre>';

How to extract links and text inside li tags?

href is an attribute of an <a> tag, not an <li>, change your code to $dom->getElementsByTagName('a'); and it will start working!

See here: https://3v4l.org/4Ln5E

PHP - Extract a tags from a url

you can use a html parser :
A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!



Related Topics



Leave a reply



Submit