PHP Regex to Get String Inside Href Tag

PHP: find text between href='' tag

Here you go:

$pageData = file_get_contents('your.txt');
if(preg_match_all('/<a\s+href=["\']([^"\']+)["\']/i', $pageData, $links, PREG_PATTERN_ORDER))
$all_hrefs = array_unique($links[1]);

Now you have all unique hrefs in $all_href;

if you want to display them:

foreach($all_href as $href)
{
echo $href;
}

php regex get custom url and string inside href tag

Description

This regex will capture the anchor tags providing they have an href attribute whose value starts with http://example.ir/. It will then capture the entire href value into capture group 1.

<a\b(?=\s) # capture the open tag
(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\shref="(http:\/\/example\.ir\/[^"]*)) # get the href attribute
(?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*"\s?> # get the entire tag
.*?<\/a>

Sample Image

Example

Sample Text

Note the last line has a potentially difficult edge case.

<a href="http://example.ir/salam/ali/....">salam ali</a>
<a class="Fonzie" href="http://example.ir/?id=123/...">plus id 123</a>
<a class="Fonzie" href="?kambiz=khare/...">not an http</a>
<a onmouseover=' href="http://example.ir/salam/ali/...." ; funHrefRotater(href) ; " href="?kambiz=khare/...">again not the line we are looking for</a>

Code

This PHP example is to only show that how the match works.

<?php
$sourcestring="your source string";
preg_match_all('/<a\b(?=\s) # capture the open tag
(?=(?:[^>=]|=\'[^\']*\'|="[^"]*"|=[^\'"][^\s>]*)*?\shref="(http:\/\/example\.ir\/[^"]*)) # get the href attribute
(?:[^>=]|=\'[^\']*\'|="[^"]*"|=[^\'"\s]*)*"\s?> # get the entire tag
.*?<\/a>/imx',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>

Matches

[0][0] = <a href="http://example.ir/salam/ali/....">salam ali</a>
[0][1] = http://example.ir/salam/ali/....
[1][0] = <a class="Fonzie" href="http://example.ir/?id=123/...">plus id 123</a>
[1][1] = http://example.ir/?id=123/...

Regex to extract first link on page inside another tag

Maybe, if you only like to extract the first h4, then you might want to modify it to,

(?i)<h4><a [^>]*\bhref\s*=\s*"\s*([^"]*)\s*".*

with an i flag.

$re = '/(?i)<h4><a [^>]*\bhref\s*=\s*"\s*([^"]*)\s*".*/s';
$str = '<h4><a href="somelinkhere" class="search_result_title" title="sometitle" data-followable="true">Some Text Here</a></h4><h4><a href="somelinkhere" class="search_result_title" title="sometitle" data-followable="true">Some Text Here</a></h4>
';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

foreach ($matches as $match) {
print($match[1]);
}

Output

somelinkhere

If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


Match the HTML link tag with given href attribute with regex

Try this

$url = preg_quote($url, '/');

echo preg_replace('/<link([^>]*?)href[\s]?=[\s]?[\'\"\\\]*'.$url.'([^>]*?)>/is', '', $html);

See it in action https://eval.in/118665

Match if string is there in anchor tag

Try the following code :

$x='<a href="">This is a test string</a>';

if(preg_match_all('~<a href="">.+test.+</a>~i',$x,$m))
{echo "Match";}
else
{echo "No match";}

Regular expression to extract link text from anchor tag

You need to assign the result:

$link = preg_replace('/<a.*?>/i', '', $link);
$link = preg_replace('/<\/a>/i', '', $link);
echo $link;


Related Topics



Leave a reply



Submit