PHP extract link from a tag
This is very easy to do using SimpleXML:
$a = new SimpleXMLElement('<a href="www.something.com">Click here</a>');
echo $a['href']; // will echo www.something.com
Regex to extract first link on page inside another tag
Maybe, if you only like to extract the first h4
, then you might want to modify it to,
(?i)<h4><a [^>]*\bhref\s*=\s*"\s*([^"]*)\s*".*
with an i
flag.
$re = '/(?i)<h4><a [^>]*\bhref\s*=\s*"\s*([^"]*)\s*".*/s';
$str = '<h4><a href="somelinkhere" class="search_result_title" title="sometitle" data-followable="true">Some Text Here</a></h4><h4><a href="somelinkhere" class="search_result_title" title="sometitle" data-followable="true">Some Text Here</a></h4>
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
foreach ($matches as $match) {
print($match[1]);
}
Output
somelinkhere
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
Extract link attributes from string of HTML
Instead of crafting long complicated regex, do it in steps
$str = '<a href="http://stackoverflow.com/"> Stack Overflow</a>';
$str = preg_replace("/.*<a\s+href=\"/","",$str);
print preg_replace("/\">.*/","",$str);
one way of "non regex", using explode
$str = '<a href="http://stackoverflow.com/"> Stack Overflow</a>';
$s = explode('href="',$str);
$t = explode('">',$s[1]);
print $t[0];
how to extract links and titles from a .html page?
Thank you everyone, I GOT IT!
The final code:
$html = file_get_contents('bookmarks.html');
//Create a new DOM document
$dom = new DOMDocument;
//Parse the HTML. The @ is used to suppress any parsing errors
//that will be thrown if the $html string isn't valid XHTML.
@$dom->loadHTML($html);
//Get all links. You could also use any other tag name here,
//like 'img' or 'table', to extract other tags.
$links = $dom->getElementsByTagName('a');
//Iterate over the extracted links and display their URLs
foreach ($links as $link){
//Extract and show the "href" attribute.
echo $link->nodeValue;
echo $link->getAttribute('href'), '<br>';
}
This shows you the anchor text assigned and the href for all links in a .html file.
Again, thanks a lot.
How to extract id and url from an anchor tag in this code using PHP?
Your question is a little not clear, so if I understand correctly, you are able to extract the anchor tag (<a>
) content easily but if the <a
tag contains href and id, then you are thinking it won't work. Plus, as per your post title, you want to extract href
and id
attribute's values as well and they may or may not be there. And in fact any of them could be missing.
In this case, you can use this regex,
<(a)(?:\s+href=(['"])(?<href>[^'"]*)\2\s*)?(?:\s+id=(['"])(?<id>[^'"]*)\4\s*)?>(.+?)<\/\1>
Explanation:
<
--> start of tag(a)
--> Expects the tag name to be `a' only and captures it in group 1 for matching it by back referencing at closing end(?:\s+href=(['"])(?<href>[^'"]*)\2\s*)?
--> This part matcheshref
attribute and captures its value inhref
named group which is optional(?:id=(['"])(?<id>[^'"]*)\4\s*)?
--> This part matchesid
attribute and captures value inid
named group which is optional too>
end of<a
tag(.+?)
--> Captures<a
tags inner text<\/\1>
--> Matches ending tag for<a
through back referencing through\1
This will still match group 1 reference which will be a
according to above regex, as well as will capture href
and id
attribute's values as well, both being optional.
Here is a demo
Let me know if this is what you wanted. In case of any queries, do let me know.
Regex How to extract link from HTML with specific path
This will work if I don't misunderstood your question.
$html = '<a href="https://www.website.com/n/?confirm.php" ></a>';
preg_match_all('/href="([^\s"]+)/', $html, $match);
print '<pre>';
print_r($match);
print '</pre>';
print $match[1][0];
Edited: As per comment, you didn't provided us the specific url that's why I just post a generic answer to capture href
. See my below answer now. Used regex will be found here https://regex101.com/r/pnfz7E/1
$re = '/<a href="([^"]*?\/n\/\?confirm\.php)">.*?<\/a>/m';
$str = '<a href="https://www.website.com/n/?noconfirm.php">SSD</a>
<div>How are you</div>
<a href="https://www.website.com/n/?confirm.php">HDD</a>
<h2>Being Sunny</h2>
<a href="https://www.ltmgtfu.com/n/?noconfirm.php">MSD</a>
<div>How are you</div>
<a href="https://www.website.com/n/?confirm.php"></a>
<h2>Being Sunny</h2>
<a href="https://www.google.com/n/?noconfirm.php">GSD</a>
<div>How are you</div>
<a href="https://www.website.com/n/?confirm.php">LSD</a>
<h2>Being Sunny</h2>';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
print '<pre>';
print_r($matches);
print '</pre>';
How to extract links and text inside li tags?
href is an attribute of an <a>
tag, not an <li>
, change your code to $dom->getElementsByTagName('a');
and it will start working!
See here: https://3v4l.org/4Ln5E
PHP - Extract a tags from a url
you can use a html parser :
A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
Related Topics
PHP Send Email with Attachment
How to Access Static Member of a Class
Reload the Page on Hitting Back Button
Multidimensional Array PHP Implode
How to Replace the Deprecated Set_Magic_Quotes_Runtime in PHP
Fatal Error: Allowed Memory Size of 268435456 Bytes Exhausted (Tried to Allocate 71 Bytes)
Change Database Connection in Laravel Model
Find Multiples of a Number in PHP
Get Only Filename from Url in PHP Without Any Variable Values Which Exist in the Url
How to Execute a Shell Script in PHP
Accessing a Variable Defined in a Parent Function
How to Force Page Not to Be Cached in PHP
Using Prepared Statement, How I Return the Id of the Inserted Row
Accented Characters in MySQL Table
PHP Put a Space in Front of Capitals in a String (Regex)
Laravel 5.2: the Process Class Relies on Proc_Open, Which Is Not Available on Your PHP Installation
What's the Best Way to Get the Fractional Part of a Float in PHP