Regex & PHP - Isolate Src Attribute from Img Tag

Regex & PHP - isolate src attribute from img tag

If you don't wish to use regex (or any non-standard PHP components), a reasonable solution using the built-in DOMDocument class would be as follows:

<?php
$doc = new DOMDocument();
$doc->loadHTML('<img src="http://example.com/img/image.jpg" ... />');
$imageTags = $doc->getElementsByTagName('img');

foreach($imageTags as $tag) {
echo $tag->getAttribute('src');
}
?>

Find image src with regex in PHP

Try

$image = '<img class="foo bar test" title="test image" src=\'http://example.com/img/image.jpg\' alt="test image" width="100" height="100" />';
$array = array();
preg_match( "/src='([^\"]*)'/i", $image, $array ) ;
print_r( $array[1] ) ;

PHP - parsing src attribute of img tag in string

The only safest way is by using DOMDocument built-in (in PHP 5) class. Use getElementsByTagName(), check if the length is more than 0, and grab the first item src value with getAttribute('src'):

$html = "YOUR_HTML_STRING";
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$imgs = $dom->getElementsByTagName('img');
if ($imgs->length > 0) {
echo $imgs->item(0)->getAttribute('src');
}

See this PHP demo

Extract SRC of IMG with specific ID in PHP

I figured out the answer myself. Here is the expression

/<img class="image" id="prdImage"(.*?)src="(.*?)"\/>/i

This will return an array with the url in the second position.

I am accepting this answer since I figured it out myself. If anybody else has a better solution, I will accept their answer.

Regex is not capturing the src from a unique img id

Note that you use a pattern with a capture group, which means the booktitle itself is in $title[1] and the same for the image src $imgURL[1]. Using [0] as the index will return the full match.

One option could be using [^>]*> at the end of the pattern, incase there is more after the attribute.

If there can be a different order, you might use a branch reset group (?| to match either one of the order, and still use group 1 to get the value.

<img[^>]*(?|id="bookImage"[^>]+src="([^"]+)"|src="([^"]+)"[^>]+id="bookImage")[^>]*>

Regex demo | Php demo

For example

$s = '<img id="bookImage" src="test.jpg">';
preg_match('/<img[^>]*(?|id="bookImage"[^>]+src="([^"]+)"|src="([^"]+)"[^>]+id="bookImage")[^>]*>/', $s, $imgURL);
var_dump($imgURL[1]);

Output

string(8) "test.jpg"

Note that you might be better of using DOMDocument::getElementById as you already know id="bookTitle" and id="bookImage".

For example

$s = '<span id="bookTitle">book title</span>';
$dom = new DOMDocument();
$dom->loadHTML($s);
$bookTitle = $dom->getElementById("bookTitle")->nodeValue;
var_dump($bookTitle);

$s = '<img id="bookImage" src="test.jpg">';
$dom = new DOMDocument();
$dom->loadHTML($s);
$imgSrc = $dom->getElementById("bookImage")->getAttribute("src");
var_dump($imgSrc);

Output

string(10) "book title"
string(8) "test.jpg"

fetching img tag, and it's src value

Regular expression to match the first IMG tag and its src value:

$subject = '<img class="c1 c2 c3" title="Image Title 1" src="http://example.com/image-1.jpg" alt="Sample Image" width="620" height="521"><img class="c1 c2 c3" title="Image Title 2" src="http://example.com/image-2.jpg" alt="Sample Image" width="620" height="521">';
preg_match('/<img\s.*?\bsrc="(.*?)".*?>/si', $subject, $matches);
print_r($matches);

Output:

Array
(
[0] => <img class="c1 c2 c3" title="Image Title 1" src="http://example.com/image-1.jpg" alt="Sample Image" width="620" height="521">
[1] => http://example.com/image-1.jpg
)

There are many tools to test regular expressions online. Here are just a few of them:

  • http://regex.larsolavtorvik.com/
  • http://www.spaweditor.com/scripts/regex/

Regular Expression to extract src attribute from img tag

Your pattern should be (unescaped):

src\s*=\s*"(.+?)"

The important part is the added question mark that matches the group as few times as possible

Regex: Extracting img-tags from string

You shouldn't really use regex on HTML, what about this:?

$string = '<span class="introduction"><img alt="image" src="/picture.jpg" /></span>';

echo strip_tags($string, '<img>');

Otherwise I would use an HTML/XML parser

Add domain to img src attribute value if a relative path

The following pattern will seek src attributes that do not start with http or https. Then for relative paths that begin with a forward slash, the leading slash will be removed before prepending the $base string to the src value.

Code: (Demo)

$base = 'https://example.com/';
echo preg_replace('~ src="(?!http)\K/?~', $base, $html);

Output:

<img src="https://example.com/docs/relative/url/img.jpg" />
<img src="https://example.com/docs/relative/url/img.jpg" />
<img src="https://docs/relative/url/img.jpg" />
<img src="http://docs/relative/url/img.jpg" />

Breakdown:

~           #starting pattern delimiter
src=" #match space, s, r, c, =, then "
(?!http) #only continue matching if not https or http
\K #forget any previously matched characters so they are not destroyed by the replacement string
/? #optionally match a forward slash
~ #ending pattern delimiter

As for your pattern, /<img src=\"[^http|https]([^\"]*)\"/:

  1. [^http|https] actually means "match a single character that is not from this list: |, h, t, p, and s. It could be simplified to [^|hpst] because the order of the listed characters in the "negated character class" is irrelevant and duplicating characters is meaningless. So you see, [^...] is not how you say "a string starts with something or somethingelse".
  2. Capturing all remaining characters in a substring until the next double quote with the intent to use it again in the replacement is unnecessary. This is why I use \K to pinpoint where $base should be injected instead of ([^\"]*).

Furthermore, I always recommend the stability of a DOM parser when dealing with a valid HTML document. You can use DOMDocument with XPath to target the qualifying elements and modify the src attributes without regex.

Code: (Demo)

$dom = new DOMDocument; 
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//img[not(starts-with(@src, 'http'))]") as $node) {
$node->setAttribute('src', $base . ltrim($node->getAttribute('src'), '/'));
}
echo $dom->saveHTML();

A related answer: https://stackoverflow.com/a/48837947/2943403

Get img src with PHP

Use a HTML parser like DOMDocument and then evaluate the value you're looking for with DOMXpath:

$html = '<img id="12" border="0" src="/images/image.jpg"
alt="Image" width="100" height="100" />';

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$src = $xpath->evaluate("string(//img/@src)"); # "/images/image.jpg"

Or for those who really need to save space:

$xpath = new DOMXPath(@DOMDocument::loadHTML($html));
$src = $xpath->evaluate("string(//img/@src)");

And for the one-liners out there:

$src = (string) reset(simplexml_import_dom(DOMDocument::loadHTML($html))->xpath("//img/@src"));


Related Topics



Leave a reply



Submit