Php/Regex: How to Get the String Value of HTML Tag

PHP/regex: How to get the string value of HTML tag?

<?php
function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname ?.*>(.*)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return $matches[1];
}

$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
$txt = getTextBetweenTags($str, "font");
echo $txt;
?>

That should do the trick

Get all text inside html tag with regex?

Use a DOM and never use regular expressions for parsing HTML.

$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('strong') as $tag) {
echo $tag->nodeValue."<br>";
}
foreach ($dom->getElementsByTagName('span') as $tag) {
echo $tag->nodeValue."<br>";
}

OUTPUT :

this one
this two
this three
test one
test two
test three

Demo



Why I shoudn't use Regular Expressions to parse HTML Content ?

HTML is not a regular language and hence cannot be parsed by regular
expressions. Regex queries are not equipped to break down HTML into
its meaningful parts. so many times but it is not getting to me. Even
enhanced irregular regular expressions as used by Perl are not up to
the task of parsing HTML.

That article was from our Jeff Atwood. Read more here.

Regex PHP find and match HTML tags with specific data-attributes

XPath is such a fantastic and versative tool. Your logic seamlessily transfers to an xpath query which is easy to construct, read, and maintain in the future.

Furthermore, XPath is superior to regex because it will successfully match qualifying elements no matter the order of the attributes. Regex will struggle to do the same with just one preg_ call.

The following will validate, extract, and store by loop the results of just one query.

Code: (Demo)

$dom=new DOMDocument; 
libxml_use_internal_errors(true); // for malformed html warning suppression
$dom->loadHTML($text, LIBXML_NOENT);
//libxml_clear_errors(); // for warning suppression
$xpath = new DOMXPath($dom);

foreach ($xpath->query("//*[@data-edit='true' and @data-type and @data-name]") as $node) {
$results[] = [
'type' => $node->getAttribute('data-type'),
'name' => $node->getAttribute('data-name'),
'text' => $node->textContent
];
}
var_export($results);

Output:

array (
0 =>
array (
'type' => 'wysiwyg',
'name' => 'Beoordeling',
'text' => 'We beoordelen uw aanvraag en berichten u over de acceptatie daarvan.',
),
1 =>
array (
'type' => 'text',
'name' => 'Bellen',
'text' => 'We bellen u voor een afspraak.',
),
2 =>
array (
'type' => 'text',
'name' => 'Technisch specialist',
'text' => 'Technisch specialist neemt bij u alles nog even door.',
),
)

PHP Regex find text between custom added HTML Tags

Assuming <PRODUCT_LIST> tags will never be nested

preg_match_all('/<PRODUCT_LIST>(.*?)<\/PRODUCT_LIST>/s', $html, $matches);

//HTML array in $matches[1]
print_r($matches[1]);

Regex get text between the html tags - PHP

try this

<?php function teste(){
$string = '<div>Hello, i am João</div><a
href="test/test.com">testttttttttttt</a>';
$matches = array();

preg_match_all('/<[^>]*>/', $string, $matches);
echo '<pre>';
print_r($matches);
}

Regular expression to get string between tags with or without id attribute

You can simply use below regex

<li.*?>(.*?)<\/li>

Over here

`<li.*?>` here `(.*)` is to capture all attributes of `li` and `?` is to if no attributes is defined or not even space count also

As both has different li structure

You can check it

Demo

Note: For HTML/XML parsing don't go for regex you can simply use DOMDocument for same

Use regular expression to extract attribute value for custom tag

If the tag you're looking for is always going to be quote, then perhaps something a little simpler is possible:

  $s ='"[QUOTE="name: Max-Fischer, post: 486662533, member: 123"]I don\'t so much dance as rhythmically convulse.[/QUOTE]';

$r = '/\[QUOTE="(.*?)"\](.*)\[\/QUOTE\]/';

$m = array();
$arr = array();
preg_match($r, $s, $m);
// m[0] = the initial string
// m[1] = the string of attributes
// m[2] = the quote itself
foreach(explode(',', $m[1]) as $valuepair) { // split the attributes on the comma
preg_match('/\s*(.*): (.*)/', $valuepair, $mm);
// mm[0] = the attribute pairing
// mm[1] = the attribute name
// mm[2] = the attribute value
$arr[$mm[1]] = $mm[2];
}
print_r($arr);
print $m[2] . "\n";

this gives the following output:

Array
(
[name] => Max-Fischer
[post] => 486662533
[member] => 123
)
I don't so much dance as rhythmically convulse.

If you want to handle the case where there is more than one quote in the string, we can do this by modifying the regex to be slightly less greedy, and then using preg_match_all, instead of preg_match

  $s ='[QUOTE="name: Max-Fischer, post: 486662533, member: 123"]I don\'t so much dance as rhythmically convulse.[/QUOTE]';
$s .='[QUOTE="name: Some-Guy, post: 486562533, member: 1234"]Quidquid latine dictum sit, altum videtur[/QUOTE]';

$r = '/\[QUOTE="(.*?)"\](.*?)\[\/QUOTE\]/';
// ^ <--- added to make it less greedy
$m = array();
$arr = array();
preg_match_all($r, $s, $m, PREG_SET_ORDER);
// m[0] = the first quote
// m[1] = the second quote
// m[0][0] = the initial string
// m[0][1] = the string of attributes
// m[0][2] = the quote itself
// element for each quote found in the string
foreach($m as $match) { // since there is more than quote, we loop and operate on them individually
$quote = array();
foreach(explode(',', $match[1]) as $valuepair) { // split the attributes on the comma
preg_match('/\s*(.*): (.*)/', $valuepair, $mm);
// mm[0] = the attribute pairing
// mm[1] = the attribute name
// mm[2] = the attribute value
$quote[$mm[1]] = $mm[2];
}
$arr[] = $quote; // we now build a parent array, to hold each individual quote
}
print_r($arr);

This gives output like:

Array
(
[0] => Array
(
[name] => Max-Fischer
[post] => 486662533
[member] => 123
)

[1] => Array
(
[name] => Some-Guy
[post] => 486562533
[member] => 1234
)

)

Php get string between tags

If you must use a regular expression, the following will do the trick.

$str = 'foo {Vimeo}123456789{/Vimeo} bar';
preg_match('~{Vimeo}([^{]*){/Vimeo}~i', $str, $match);
var_dump($match[1]); // string(9) "123456789"

This may be more than what you want to go through, but here is a way to avoid regex.

$str = 'foo {Vimeo}123456789{/Vimeo} bar';
$m = substr($str, strpos($str, '{Vimeo}')+7);
$m = substr($m, 0, strpos($m, '{/Vimeo}'));
var_dump($m); // string(9) "123456789"


Related Topics



Leave a reply



Submit