Preg_Match_All <A Href

Preg_match_all a href

Regex for parsing links is something like this:

'/<a\s+(?:[^"'>]+|"[^"]*"|'[^']*')*href=("[^"]+"|'[^']+'|[^<>\s]+)/i'

Given how horrible that is, I would recommend using Simple HTML Dom for getting the links at least. You could then check links using some very basic regex on the link href.

preg_match_all Get links from a href= with Class

You have an extra space in your regex:

preg_match_all('/<a\s*class=\"coolCard \s[^\>]*\"\s*href=([\'"])(.*?)\\1/is', $wordString, $links);
// here __^

Juste remove it:

preg_match_all('/<a\s*class=\"coolCard\s[^\>]*\"\s*href=([\'"])(.*?)\\1/is', $wordString, $links);

.

$wordString = '<a class="coolCard project-card " href="http://www.aabbcc.com/post//tank-farm-good-be" id="2568">';
preg_match_all('/<a\s*class=\"coolCard\s[^\>]*\"\s*href=([\'"])(.*?)\\1/is', $wordString, $links);
print_r($links);

Output:

Array
(
[0] => Array
(
[0] => <a class="coolCard project-card " href="http://www.aabbcc.com/post//tank-farm-good-be"
)

[1] => Array
(
[0] => "
)

[2] => Array
(
[0] => http://www.aabbcc.com/post//tank-farm-good-be
)

)

preg_match link text with less-than sign in it

This is the fundamental issue with trying to regex HTML. This is not really good HTML - because contents that are not meant to be interpreted as HTML should be html entities (aka <e; instead of <). You won't always be able to handle that though.

In your case, something like this works for regex:

|<a href="/blabla/([0-9]+)">.*?</a>|Uis

The matching group gets shifted. This also allows nested tags (like <a><b><i></i></b></a>).

Keep in mind that the Ungreedy tag you used means that you can be a little more lax in your regex matching. If you wanted to do this without the U modifier you'd maybe need to do some negative lookaheads.

|<a href="/blabla/([0-9]+)">(?:(?!</a>).)*</a>|is

preg_match_all How to get all links?

1. To capture src attribute starting by http://i.ebayimg.com/ of all img tags :

regex : /src=\"((?:http|https):\\/\\/i.ebayimg.com\\/.+?.jpg)\"/i

Here is an example :

$re = "/src=\"((?:http|https):\\/\\/i.ebayimg.com\\/.+?.jpg)\"/i"; 
$str = "codeOfHTMLPage";
preg_match_all($re, $str, $matches);

Check it in live : here

If you want to be sure that you capture this url on an img tag then use this regex (keep in mind that performance will decrease if page is very long) :

$re = "/<img(?:.*?)src=\"((?:http|https):\\/\\/i.ebayimg.com\\/.+?.jpg)\"/i";

2. To capture href attribute starting by http://i.ebayimg.com/ of all a tags :

regex : /href=\"((?:http|https):\\/\\/suchen.mobile.de\\/fahrzeuge\\/.+?.jpg)\"/i

Here is an example :

$re = "/href=\"((?:http|https):\\/\\/suchen.mobile.de\\/fahrzeuge\\/.+?.jpg)\"/i; 
$str = "codeOfHTMLPage";
preg_match_all($re, $str, $matches);

Check it in live : here

If you want to be sure that you capture this url on an a tag then use this regex (keep in mind that performance will decrease if page is very long) :

$re = "/<a(?:.*?)href=\"((?:http|https):\\/\\/suchen.mobile.de\\/fahrzeuge\\/.+?.jpg)\"/i";

need preg_match_all links

Try this one:

/http:\/\/([^\s]+)/

php preg_match_all href inner texts with upper and lower case sensitivity

Please use another approach instead, e.g. with an xpath:

$xml = simplexml_load_string($string);
$links= $xml->xpath("//a");
foreach ($links as $link)
echo $link["href"];

See a demo on ideone.com.
A solution with a regex would be:

~(?i)href=('|")(?<link>[^'"]+)\1(?i-)~
# case-insensitive
# look for href= literally
# look for a single/double quote and capture it in group 1
# match everything that is not a singel or double quote 1 or more times
# match the first captured group again
# and turn case sensitivity on again

A demo can be found on regex101.com, but better use the first approach.

How use preg_match_all to check href for a value

You should change this ...

if ($cell->find('a')){
foreach ($cell as $anchor)

to this ...

foreach ($cell->find('a') as $anchor){

right now you are just converting the $cell to $anchor so you are looking for a href on a td element and not the a.

With PHP preg_match_all, get value of href

using simplexml

$html = '<link type="text/html" rel="alternate" href="http://link"/>';
$xml = simplexml_load_string($html);
$attr = $xml->attributes();

using dom

$dom = new DOMDocument;
$dom->loadHTML($html);
$nodes = $dom->getElementsByTagName('link');
$attr = $nodes->item(0)->getAttribute('href');

PHP get all links and images with preg_match_all

you can get it with this

$re = '/(alt|href|src)=("[^"]*")/'; 
$str = '<a class="picLink" target="_blank" href="/EN/news/397423/MY-TEST-NEWS">\n <img class="fr" width="115" style="margin:6px 0px 0px 8px;width: 115px;" src="http://sample.com/files/EN/news/369326_276.jpg" alt="MY TEST NEWS !">\n</a>\n<a class="picLink" target="_blank" href="/EN/news/397423/MY-TEST-NEWS">\n <img class="fr" width="115" style="margin:6px 0px 0px 8px;width: 115px;" src="http://sample.com/files/EN/news/369326_276.jpg" alt="MY TEST NEWS !">\n</a>';

preg_match_all($re, $str, $matches);
print_r($matches);

output

(
[0] => Array
(
[0] => href="/EN/news/397423/MY-TEST-NEWS"
[1] => src="http://sample.com/files/EN/news/369326_276.jpg"
[2] => alt="MY TEST NEWS !"
[3] => href="/EN/news/397423/MY-TEST-NEWS"
[4] => src="http://sample.com/files/EN/news/369326_276.jpg"
[5] => alt="MY TEST NEWS !"
)

[1] => Array
(
[0] => href
[1] => src
[2] => alt
[3] => href
[4] => src
[5] => alt
)

[2] => Array
(
[0] => "/EN/news/397423/MY-TEST-NEWS"
[1] => "http://sample.com/files/EN/news/369326_276.jpg"
[2] => "MY TEST NEWS !"
[3] => "/EN/news/397423/MY-TEST-NEWS"
[4] => "http://sample.com/files/EN/news/369326_276.jpg"
[5] => "MY TEST NEWS !"
)

)

in array[2] you'll get the desired values



Related Topics



Leave a reply



Submit