php regex to match outside of html tags
You can use an assertion for that, as you just have to ensure that the searched words occur somewhen after an >
, or before any <
. The latter test is easier to accomplish as lookahead assertions can be variable length:
/(asf|foo|barr)(?=[^>]*(<|$))/
See also http://www.regular-expressions.info/lookaround.html for a nice explanation of that assertion syntax.
Regular expression to match text outside html tags and not between specific tag
use this pattern to skip/ fail everything between <h1></h1>
Updated per comment below
<h1>[^<>]*<\/h1>(*SKIP)(*F)|(\bsample|text\b)(?=[^>]*(?:<|$))
Demo
PHP Regular expression to match keyword outside HTML tag <a>
I managed to do what I wanted (without using Regex) by:
- parsing each character of my string
- removing all
<a>
tags (copying them to a temporary array and keeping a placeholder on the string) str_replace
the new string in order to replace all the keywords- repopulating the placeholders by it's original
<a>
tags
Here's the code I used, in case someone else needs it:
$str = <<<STRA
Moses supposes his toeses are roses,
but <a href="original-moses1.html">Moses</a> supposes erroneously;
for nobody's toeses are posies of roses,
as Moses supposes his toeses to be.
Ganda <span class="cenas"><a href="original-moses2.html" target="_blank">Moses</a></span>!
STRA;
$arr1 = str_split($str);
$arr_links = array();
$phrase_holder = '';
$current_a = 0;
$goto_arr_links = false;
$close_a = false;
foreach($arr1 as $k => $v)
{
if ($close_a == true)
{
if ($v == '>') {
$close_a = false;
}
continue;
}
if ($goto_arr_links == true)
{
$arr_links[$current_a] .= $v;
}
if ($v == '<' && $arr1[$k+1] == 'a') { /* <a */
// keep collecting every char until </a>
$arr_links[$current_a] .= $v;
$goto_arr_links = true;
} elseif ($v == '<' && $arr1[$k+1] == '/' && $arr1[$k+2] == 'a' && $arr1[$k+3] == '>' ) { /* </a> */
$arr_links[$current_a] .= "/a>";
$goto_arr_links = false;
$close_a = true;
$phrase_holder .= "{%$current_a%}"; /* put a parameter holder on the phrase */
$current_a++;
}
elseif ($goto_arr_links == false) {
$phrase_holder .= $v;
}
}
echo "Links Array:\n";
print_r($arr_links);
echo "\n\n\nPhrase Holder:\n";
echo $phrase_holder;
echo "\n\n\n(pre) Final Phrase (with my keyword replaced):\n";
$final_phrase = str_replace("Moses", "<a href=\"novo-mega-link.php\">Moses</a>", $phrase_holder);
echo $final_phrase;
echo "\n\n\nFinal Phrase:\n";
foreach($arr_links as $k => $v)
{
$final_phrase = str_replace("{%$k%}", $v, $final_phrase);
}
echo $final_phrase;
The output:
Links Array:
Array
(
[0] => <a href="original-moses1.html">Moses</a>
[1] => <a href="original-moses2.html" target="_blank">Moses</a>
)
Phrase Holder:
Moses supposes his toeses are roses,
but {%0%} supposes erroneously;
for nobody's toeses are posies of roses,
as Moses supposes his toeses to be.
Ganda <span class="cenas">{%1%}</span>!
(pre) Final Phrase (with my keyword replaced):
<a href="novo-mega-link.php">Moses</a> supposes his toeses are roses,
but {%0%} supposes erroneously;
for nobody's toeses are posies of roses,
as <a href="novo-mega-link.php">Moses</a> supposes his toeses to be.
Ganda <span class="cenas">{%1%}</span>!
Final Phrase:
<a href="novo-mega-link.php">Moses</a> supposes his toeses are roses,
but <a href="original-moses1.html">Moses</a> supposes erroneously;
for nobody's toeses are posies of roses,
as <a href="novo-mega-link.php">Moses</a> supposes his toeses to be.
Ganda <span class="cenas"><a href="original-moses2.html" target="_blank">Moses</a></span>!
PHP regex to match HTML tag names except some tags
<(?:(?!input)[^>])*>(?:<\/[^>]*>)?
Try this.See demo.
https://www.regex101.com/r/fG5pZ8/13
$re = "/<(?:(?!input)[^>])*>(?:<\\/[^>]*>)?/im";
$str = "<input type=\"text\">\n<img src=\">\n<a href=\"\">\n<button type=\"button\"></button>\n<div id=\"some\"></div>\n<p></p>";
preg_match_all($re, $str, $matches);
Edit:
Use
(?!<input)<([A-Z0-9a-z]+)([^>]*>)?
If you want to save tag separately.
https://www.regex101.com/r/fG5pZ8/16
Extract text outside html tags
You can use PHP's DOMDocument
and DOMXPath
to get the values that you want. The trick is to wrap the HTML from your database in a (for example) <div>
tag, and you can then load it into a DOMDocument
and use DOMXPath
to search for children of the <div>
tag which are purely text using the text()
path:
$html = 'This should be extracted <p>I do not want this</p> This should also be extracted <a>This may appear after other tags and I do not want this</a>';
$doc = new DOMDocument();
$doc->loadHTML("<div>$html</div>", LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
$xpath = new DOMXPath($doc);
$texts = array();
foreach ($xpath->query('/div/text()') as $text) {
$texts[] = $text->nodeValue;
}
print_r($texts);
Output:
Array (
[0] => This should be extracted
[1] => This should also be extracted
)
Demo on 3v4l.org
Regex replace text outside script tag
My pattern will use (*SKIP)(*FAIL)
to disqualify matched script tags and their contents.
text
and simple
will be match on every qualifying occurrence.
Regex Pattern: ~<script.*?/script>(*SKIP)(*FAIL)|text|simple~
Pattern / Replacement Demo Link
Code: (Demo)
$strings=['This has no replacements',
'This simple text has no script tag',
'This simple text ends with a script tag <script language="javascript">simple simple text text</script>',
'This is simple html text is split by a script tag <script language="javascript">simple simple text text</script> text',
'<script language="javascript">simple simple text text</script> this text starts with a script tag'
];
$strings=preg_replace('~<script.*?/script>(*SKIP)(*FAIL)|text|simple~','***replaced***',$strings);
var_export($strings);
Output:
array (
0 => 'This has no replacements',
1 => 'This ***replaced*** ***replaced*** has no script tag',
2 => 'This ***replaced*** ***replaced*** ends with a script tag <script language="javascript">simple simple text text</script>',
3 => 'This is ***replaced*** html ***replaced*** is split by a script tag <script language="javascript">simple simple text text</script> ***replaced***',
4 => '<script language="javascript">simple simple text text</script> this ***replaced*** starts with a script tag',
)
Match text both inside and outside html tags, with grouping
I would replace the .*?
everywhere with what you are really looking for.
The regular expression could be this:
(?=.+)((<([^>]+)>)?([^<]+)?(<\/([^>]+)>)?)
(?=.+)
will make sure the match starts with something. All our capture groups are optional here, so to avoid an extra null match at the end we'll use this lookahead- When finding the tagname:
[^>]+
- When finding text in tags:
[^<]+
([^<]+)?
makes text within spans optional
Regex101 playground:
https://regex101.com/r/1caMOA/2
Regex replace text outside html tags
Okay, try using this regex:
(text|simple)(?![^<]*>|[^<>]*</)
Example worked on regex101.
Breakdown:
( # Open capture group
text # Match 'text'
| # Or
simple # Match 'simple'
) # End capture group
(?! # Negative lookahead start (will cause match to fail if contents match)
[^<]* # Any number of non-'<' characters
> # A > character
| # Or
[^<>]* # Any number of non-'<' and non-'>' characters
</ # The characters < and /
) # End negative lookahead.
The negative lookahead will prevent a match if text
or simple
is between html tags.
PHP Regex to remove HTML-Tags inside <pre></pre> code blocks
You will need to use preg_replace_callback
and call strip_tags
in callback body:
preg_replace_callback('~(<pre[^>]*>)([\s\S]*?)(</pre>)~',
function ($m) { return $m[1] . strip_tags($m[2], ['p', 'b', 'strong']) . $m[3]; },
$s);
Some text.
<pre>
a = 5
b = 3
</pre>
More text
<pre>
a2 = "text"
b = 3
</pre>
final text
Note that above strip_tags
strips all tags except p
, b
and strong
.
RegEx Details:
(<pre[^>]*>)
: Match<pre...>
and capture in group #1([\s\S]*?)
: Match 0 or or more of any character including newline (lazy), capture this in group $2.[\s\S]
matches any character including newline.(</pre>)
: Match</pre>
and capture in group #3
Related Topics
How to Replace Microsoft-Encoded Quotes in PHP
Using $_Post to Get Select Option Value from Html
How to Resolve Ambiguous Column Names When Retrieving Results
How to Write a PHP Ternary Operator
How to Have a 64-Bit Integer in PHP
How to Return Outer HTML of Domdocument
Correct Way to Use Like '%{$Var}%' With Prepared Statements
Print String With a PHP Variable in It
PHP Code to Convert a MySQL Query to Csv
Selecting a CSS Class With Xpath
PHP Sort Array by Subarray Value
How to Execute Two MySQL Queries as One in PHP/MySQL
Get First Key in a (Possibly) Associative Array
How to Make an Asynchronous Get Request in PHP