Any Preg_Match() to Extract Image Urls from Text

Any preg_match() to extract image urls from text?

Please note the special occasions where they can fool your server inserting fake matches.

For example:

http://www.myserver.com/virus.exe?fakeParam=.jpg

Or

http://www.myserver.com/virus.exe#fakeParam=.jpg

I've modified quickly the regex to avoid this cases, but i'm pretty sure there could be more (like inserting %00 in the path of the file, for example, and cannot be easily parsed by a regex)

$matches = array();
preg_match_all('!http://[^?#]+\.(?:jpe?g|png|gif)!Ui' , $string , $matches);

So, for security, use always regex in the most restrictive way, for example, if you know the server, write it into the regex, or if you know that the path always will include letters, hyphens, dots, slashes and numbers, use one expression like:

$matches = array();
preg_match_all('!http://[a-z0-9\-\.\/]+\.(?:jpe?g|png|gif)!Ui' , $string , $matches);

This should avoid any funny surprise in the future.

Extract image urls using regex from a text

If you want to extract the entire URL, perhaps, simply:

https:.*?\.(?:png|jpg|svg)

Php preg_match and preg_replace text with url and image tags

You can do it in a single pass, your two patterns are very similar and it's easy to build a pattern that handles the two cases. Using preg_replace_callback, you can choose the replacement string in the callback function:

$post = "This is my text with http://www.google.com and some image http://www.domain.com/somewebimage.png";

# the pattern is very basic and can be improved to handle more complicated URLs
$pattern = '~\b(?:ht|f)tps?://[a-z0-9.-]+\.[a-z]{2,3}(?:/\S*)?~i';
$imgExt = ['.png', '.gif', '.jpg', '.jpeg'];
$callback = function ($m) use ($imgExt) {
if ( false === $extension = parse_url($m[0], PHP_URL_PATH) )
return $m[0];

$extension = strtolower(strrchr($extension, '.'));

if ( in_array($extension, $imgExt) )
return '<img src="' . $m[0] . '" width="300" style="float: right;">';
# better to do that via a css rule --^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
return '<a href="' . $m[0] . '" target="_blank">' . $m[0] . '</a>';
};

$result = preg_replace_callback($pattern, $callback, $post);

Detect and extract image url from text and html tags

A quick attempt at an <img/> tag specific regex:

preg_match_all('/<img[^>]*?\s+src\s*=\s*"([^"]+)"[^>]*?>/i', $str, $matches);

Example

Extracting links from a piece of text in PHP except ignoring image links

Using just that one string to test, the following works for me:

$str =  '<p class="fr-tag">Please visit https://www.google.co.uk/?gfe_rd=cr&ei=9P2DVaW2BMWo8wfK74HYCg and this <a href="https://www.google.co.uk/?gfe_rd=cr&ei=9P2DVaW2BMWo8wfK74HYCg" rel="nofollow">link</a> should be filtered and this http://d.pr/i/1i2Xu <img class="fr-fin fr-tag" alt="Image title" src="https://cft-forum.s3-us-west-2.amazonaws.com/uploads%2F1434714755338-Screen+Shot+2015-06-19+at+12.52.28.png" width="300"></p>';

preg_match('~a href="(.*?)"~', $str, $strArr);

Using a href ="..." in the preg_match() statement returns an array, $strArr containing two values, the two links to google.

Array
(
[0] => a href="https://www.google.co.uk/?gfe_rd=cr&ei=9P2DVaW2BMWo8wfK74HYCg"
[1] => https://www.google.co.uk/?gfe_rd=cr&ei=9P2DVaW2BMWo8wfK74HYCg
)

How to extract all image urls from a nested object or json string

Something like this should do the trick. I did not test though.

function getAllImagesFromObject($obj) {
$result = array();
foreach ($obj as $prop)
if (is_array($prop) || is_object($prop))
$result = array_merge($result, getAllImagesFromObject($prop));
else if (preg_match('/\.(jpg|jpeg|gif|png|bmp)$/', $prop))
$result[] = $prop;
return $result;
}

Matching images links containing spaces

This is what worked for me:

I replaced

   [^\s]*

with

   .*?

Preg_match urls in css

For your example data, one option could be to recurse the first subpattern (?1 and use a second capturing group for the url.

The url will be in capturing group 2.

url(\(((?:[^()]+|(?1))+)\))

Regex demo | Php demo

Explanation

  • url
  • ( First capturing group

    • \( Match ( char
    • ( Second capturing group

      • (?:[^()]+|(?1))+ Match either 1+ times not what is listed in the character class or recurse the first subpattern and repeat 1+ times
    • ) Close second capturing group
    • \) Match ) char
  • ) Close first capturing group

This will also match the leading and trailing " and ' of a url. You could do another check when getting the matches using a capturing group to verify if the starting type of quote is the same as the end type of quote.

For example:

$re = '/url(\(((?:[^()]+|(?1))+)\))/m';
$str = 'background:url("/product/header/img1.png") and background:url("/product/header/img2.png\' and background:url(/product/header/img3.png"))';

preg_match_all($re, $str, $matches, PREG_SET_ORDER);

foreach ($matches as $match) {
if (preg_match('/^([\'"]?)[^"]+\1$/', $match[2])) {
echo trim($match[2], "'\"") . PHP_EOL;
}
}

Result:

/product/header/img1.png


Related Topics



Leave a reply



Submit