PHP/regex: How to get the string value of HTML tag?
<?php
function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname ?.*>(.*)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return $matches[1];
}
$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
$txt = getTextBetweenTags($str, "font");
echo $txt;
?>
That should do the trick
Get all text inside html tag with regex?
Use a DOM
and never use regular expressions for parsing HTML.
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('strong') as $tag) {
echo $tag->nodeValue."<br>";
}
foreach ($dom->getElementsByTagName('span') as $tag) {
echo $tag->nodeValue."<br>";
}
OUTPUT :
this one
this two
this three
test one
test two
test three
Demo
Why I shoudn't use Regular Expressions to parse HTML Content ?
HTML is not a regular language and hence cannot be parsed by regular
expressions. Regex queries are not equipped to break down HTML into
its meaningful parts. so many times but it is not getting to me. Even
enhanced irregular regular expressions as used by Perl are not up to
the task of parsing HTML.
That article was from our Jeff Atwood. Read more here.
Regex PHP find and match HTML tags with specific data-attributes
XPath is such a fantastic and versative tool. Your logic seamlessily transfers to an xpath query which is easy to construct, read, and maintain in the future.
Furthermore, XPath is superior to regex because it will successfully match qualifying elements no matter the order of the attributes. Regex will struggle to do the same with just one preg_
call.
The following will validate, extract, and store by loop the results of just one query.
Code: (Demo)
$dom=new DOMDocument;
libxml_use_internal_errors(true); // for malformed html warning suppression
$dom->loadHTML($text, LIBXML_NOENT);
//libxml_clear_errors(); // for warning suppression
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//*[@data-edit='true' and @data-type and @data-name]") as $node) {
$results[] = [
'type' => $node->getAttribute('data-type'),
'name' => $node->getAttribute('data-name'),
'text' => $node->textContent
];
}
var_export($results);
Output:
array (
0 =>
array (
'type' => 'wysiwyg',
'name' => 'Beoordeling',
'text' => 'We beoordelen uw aanvraag en berichten u over de acceptatie daarvan.',
),
1 =>
array (
'type' => 'text',
'name' => 'Bellen',
'text' => 'We bellen u voor een afspraak.',
),
2 =>
array (
'type' => 'text',
'name' => 'Technisch specialist',
'text' => 'Technisch specialist neemt bij u alles nog even door.',
),
)
PHP Regex find text between custom added HTML Tags
Assuming <PRODUCT_LIST>
tags will never be nested
preg_match_all('/<PRODUCT_LIST>(.*?)<\/PRODUCT_LIST>/s', $html, $matches);
//HTML array in $matches[1]
print_r($matches[1]);
Regex get text between the html tags - PHP
try this
<?php function teste(){
$string = '<div>Hello, i am João</div><a
href="test/test.com">testttttttttttt</a>';
$matches = array();
preg_match_all('/<[^>]*>/', $string, $matches);
echo '<pre>';
print_r($matches);
}
Regular expression to get string between tags with or without id attribute
You can simply use below regex
<li.*?>(.*?)<\/li>
Over here
`<li.*?>` here `(.*)` is to capture all attributes of `li` and `?` is to if no attributes is defined or not even space count also
As both has different li
structure
You can check it
Demo
Note: For
HTML/XML
parsing don't go for regex you can simply useDOMDocument
for same
Use regular expression to extract attribute value for custom tag
If the tag you're looking for is always going to be quote, then perhaps something a little simpler is possible:
$s ='"[QUOTE="name: Max-Fischer, post: 486662533, member: 123"]I don\'t so much dance as rhythmically convulse.[/QUOTE]';
$r = '/\[QUOTE="(.*?)"\](.*)\[\/QUOTE\]/';
$m = array();
$arr = array();
preg_match($r, $s, $m);
// m[0] = the initial string
// m[1] = the string of attributes
// m[2] = the quote itself
foreach(explode(',', $m[1]) as $valuepair) { // split the attributes on the comma
preg_match('/\s*(.*): (.*)/', $valuepair, $mm);
// mm[0] = the attribute pairing
// mm[1] = the attribute name
// mm[2] = the attribute value
$arr[$mm[1]] = $mm[2];
}
print_r($arr);
print $m[2] . "\n";
this gives the following output:
Array
(
[name] => Max-Fischer
[post] => 486662533
[member] => 123
)
I don't so much dance as rhythmically convulse.
If you want to handle the case where there is more than one quote in the string, we can do this by modifying the regex to be slightly less greedy, and then using preg_match_all
, instead of preg_match
$s ='[QUOTE="name: Max-Fischer, post: 486662533, member: 123"]I don\'t so much dance as rhythmically convulse.[/QUOTE]';
$s .='[QUOTE="name: Some-Guy, post: 486562533, member: 1234"]Quidquid latine dictum sit, altum videtur[/QUOTE]';
$r = '/\[QUOTE="(.*?)"\](.*?)\[\/QUOTE\]/';
// ^ <--- added to make it less greedy
$m = array();
$arr = array();
preg_match_all($r, $s, $m, PREG_SET_ORDER);
// m[0] = the first quote
// m[1] = the second quote
// m[0][0] = the initial string
// m[0][1] = the string of attributes
// m[0][2] = the quote itself
// element for each quote found in the string
foreach($m as $match) { // since there is more than quote, we loop and operate on them individually
$quote = array();
foreach(explode(',', $match[1]) as $valuepair) { // split the attributes on the comma
preg_match('/\s*(.*): (.*)/', $valuepair, $mm);
// mm[0] = the attribute pairing
// mm[1] = the attribute name
// mm[2] = the attribute value
$quote[$mm[1]] = $mm[2];
}
$arr[] = $quote; // we now build a parent array, to hold each individual quote
}
print_r($arr);
This gives output like:
Array
(
[0] => Array
(
[name] => Max-Fischer
[post] => 486662533
[member] => 123
)
[1] => Array
(
[name] => Some-Guy
[post] => 486562533
[member] => 1234
)
)
Php get string between tags
If you must use a regular expression, the following will do the trick.
$str = 'foo {Vimeo}123456789{/Vimeo} bar';
preg_match('~{Vimeo}([^{]*){/Vimeo}~i', $str, $match);
var_dump($match[1]); // string(9) "123456789"
This may be more than what you want to go through, but here is a way to avoid regex.
$str = 'foo {Vimeo}123456789{/Vimeo} bar';
$m = substr($str, strpos($str, '{Vimeo}')+7);
$m = substr($m, 0, strpos($m, '{/Vimeo}'));
var_dump($m); // string(9) "123456789"
Related Topics
How to Bind Multiple Parameters to MySQLi Query
Php: Variable Not Working Inside of Function
Composer Killed While Updating
PHP Create and Save a Txt File to Root Directory
Get Nearest Places on Google Maps, Using MySQL Spatial Data
Display Message Before Redirect to Other Page
How to Set Utf-8 Encoding for a PHP File
Merging Two Multidimensional Arrays on Specific Key
How to Convert PHP Date Formats to Gmt and Vice Versa
How to Connect to MySQL Database in PHP Using MySQLi Extension
Password_Hash Returns Different Value Every Time
Why Are Floating Point Numbers Printed So Differently
How to Force Users to Access My Page Over Https Instead of Http
Setup Http Expires Headers Using PHP and Apache
Rerouting All PHP Requests Through Index.Php
Curl and Https, "Cannot Resolve Host"
Curl and Ping - How to Check Whether a Website Is Either Up or Down