Regex & PHP - isolate src attribute from img tag
If you don't wish to use regex (or any non-standard PHP components), a reasonable solution using the built-in DOMDocument class would be as follows:
<?php
$doc = new DOMDocument();
$doc->loadHTML('<img src="http://example.com/img/image.jpg" ... />');
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
echo $tag->getAttribute('src');
}
?>
Find image src with regex in PHP
Try
$image = '<img class="foo bar test" title="test image" src=\'http://example.com/img/image.jpg\' alt="test image" width="100" height="100" />';
$array = array();
preg_match( "/src='([^\"]*)'/i", $image, $array ) ;
print_r( $array[1] ) ;
PHP - parsing src attribute of img tag in string
The only safest way is by using DOMDocument
built-in (in PHP 5) class. Use getElementsByTagName()
, check if the length is more than 0, and grab the first item src
value with getAttribute('src')
:
$html = "YOUR_HTML_STRING";
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$imgs = $dom->getElementsByTagName('img');
if ($imgs->length > 0) {
echo $imgs->item(0)->getAttribute('src');
}
See this PHP demo
Extract SRC of IMG with specific ID in PHP
I figured out the answer myself. Here is the expression
/<img class="image" id="prdImage"(.*?)src="(.*?)"\/>/i
This will return an array with the url in the second position.
I am accepting this answer since I figured it out myself. If anybody else has a better solution, I will accept their answer.
Regex is not capturing the src from a unique img id
Note that you use a pattern with a capture group, which means the booktitle itself is in $title[1]
and the same for the image src $imgURL[1]
. Using [0]
as the index will return the full match.
One option could be using [^>]*>
at the end of the pattern, incase there is more after the attribute.
If there can be a different order, you might use a branch reset group (?|
to match either one of the order, and still use group 1 to get the value.
<img[^>]*(?|id="bookImage"[^>]+src="([^"]+)"|src="([^"]+)"[^>]+id="bookImage")[^>]*>
Regex demo | Php demo
For example
$s = '<img id="bookImage" src="test.jpg">';
preg_match('/<img[^>]*(?|id="bookImage"[^>]+src="([^"]+)"|src="([^"]+)"[^>]+id="bookImage")[^>]*>/', $s, $imgURL);
var_dump($imgURL[1]);
Output
string(8) "test.jpg"
Note that you might be better of using DOMDocument::getElementById as you already know id="bookTitle"
and id="bookImage"
.
For example
$s = '<span id="bookTitle">book title</span>';
$dom = new DOMDocument();
$dom->loadHTML($s);
$bookTitle = $dom->getElementById("bookTitle")->nodeValue;
var_dump($bookTitle);
$s = '<img id="bookImage" src="test.jpg">';
$dom = new DOMDocument();
$dom->loadHTML($s);
$imgSrc = $dom->getElementById("bookImage")->getAttribute("src");
var_dump($imgSrc);
Output
string(10) "book title"
string(8) "test.jpg"
fetching img tag, and it's src value
Regular expression to match the first IMG tag and its src value:
$subject = '<img class="c1 c2 c3" title="Image Title 1" src="http://example.com/image-1.jpg" alt="Sample Image" width="620" height="521"><img class="c1 c2 c3" title="Image Title 2" src="http://example.com/image-2.jpg" alt="Sample Image" width="620" height="521">';
preg_match('/<img\s.*?\bsrc="(.*?)".*?>/si', $subject, $matches);
print_r($matches);
Output:
Array
(
[0] => <img class="c1 c2 c3" title="Image Title 1" src="http://example.com/image-1.jpg" alt="Sample Image" width="620" height="521">
[1] => http://example.com/image-1.jpg
)
There are many tools to test regular expressions online. Here are just a few of them:
- http://regex.larsolavtorvik.com/
- http://www.spaweditor.com/scripts/regex/
Regular Expression to extract src attribute from img tag
Your pattern should be (unescaped):
src\s*=\s*"(.+?)"
The important part is the added question mark that matches the group as few times as possible
Regex: Extracting img-tags from string
You shouldn't really use regex on HTML, what about this:?
$string = '<span class="introduction"><img alt="image" src="/picture.jpg" /></span>';
echo strip_tags($string, '<img>');
Otherwise I would use an HTML/XML parser
Add domain to img src attribute value if a relative path
The following pattern will seek src
attributes that do not start with http
or https
. Then for relative paths that begin with a forward slash, the leading slash will be removed before prepending the $base
string to the src
value.
Code: (Demo)
$base = 'https://example.com/';
echo preg_replace('~ src="(?!http)\K/?~', $base, $html);
Output:
<img src="https://example.com/docs/relative/url/img.jpg" />
<img src="https://example.com/docs/relative/url/img.jpg" />
<img src="https://docs/relative/url/img.jpg" />
<img src="http://docs/relative/url/img.jpg" />
Breakdown:
~ #starting pattern delimiter
src=" #match space, s, r, c, =, then "
(?!http) #only continue matching if not https or http
\K #forget any previously matched characters so they are not destroyed by the replacement string
/? #optionally match a forward slash
~ #ending pattern delimiter
As for your pattern, /<img src=\"[^http|https]([^\"]*)\"/
:
[^http|https]
actually means "match a single character that is not from this list:|
,h
,t
,p
, ands
. It could be simplified to[^|hpst]
because the order of the listed characters in the "negated character class" is irrelevant and duplicating characters is meaningless. So you see,[^...]
is not how you say "a string starts with something or somethingelse".- Capturing all remaining characters in a substring until the next double quote with the intent to use it again in the replacement is unnecessary. This is why I use
\K
to pinpoint where$base
should be injected instead of([^\"]*)
.
Furthermore, I always recommend the stability of a DOM parser when dealing with a valid HTML document. You can use DOMDocument with XPath to target the qualifying elements and modify the src
attributes without regex.
Code: (Demo)
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//img[not(starts-with(@src, 'http'))]") as $node) {
$node->setAttribute('src', $base . ltrim($node->getAttribute('src'), '/'));
}
echo $dom->saveHTML();
A related answer: https://stackoverflow.com/a/48837947/2943403
Get img src with PHP
Use a HTML parser like DOMDocument
and then evaluate the value you're looking for with DOMXpath
:
$html = '<img id="12" border="0" src="/images/image.jpg"
alt="Image" width="100" height="100" />';
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$src = $xpath->evaluate("string(//img/@src)"); # "/images/image.jpg"
Or for those who really need to save space:
$xpath = new DOMXPath(@DOMDocument::loadHTML($html));
$src = $xpath->evaluate("string(//img/@src)");
And for the one-liners out there:
$src = (string) reset(simplexml_import_dom(DOMDocument::loadHTML($html))->xpath("//img/@src"));
Related Topics
How to Redirect After Download in Laravel
Submitting JSON Data via Jquery Ajax.Post to PHP
PHP Upload, Extract and Progressbar
Get Numbers from String with PHP
Html,PHP - Escape '<' and '>' Symbols While Echoing
Best Way to Allow Plugins for a PHP Application
Too Much Data with Var_Dump in Symfony2 Doctrine2
Generating a Random Hex Color Code with PHP
Which Mime Type Should I Use for Mp3
PHP Can't Connect to MySQL with Error 13 (But Command Line Can)
Regex & PHP - Isolate Src Attribute from Img Tag
How to Sanitize Input with Pdo
How to Send a Status Code in PHP, Without Maintaining an Array of Status Names
Split String into Sentences Using Regex