PHP: Regex to ignore escaped quotes within quotes
For most strings, you need to allow escaped anything (not just escaped quotes). e.g. you most likely need to allow escaped characters like "\n"
and "\t"
and of course, the escaped-escape: "\\"
.
This is a frequently asked question, and one which was solved (and optimized) long ago. Jeffrey Friedl covers this question in depth (as an example) in his classic work: Mastering Regular Expressions (3rd Edition). Here is the regex you are looking for:
Good:
"([^"\\]|\\.)*"
Version 1: Works correctly but is not terribly efficient.
Better:
"([^"\\]++|\\.)*"
or "((?>[^"\\]+)|\\.)*"
Version 2: More efficient if you have possessive quantifiers or atomic groups (See: sin's correct answer which uses the atomic group method).
Best:
"[^"\\]*(?:\\.[^"\\]*)*"
Version 3: More efficient still. Implements Friedl's: "unrolling-the-loop" technique. Does not require possessive or atomic groups (i.e. this can be used in Javascript and other less-featured regex engines.)
Here are the recommended regexes in PHP syntax for both double and single quoted sub-strings:
$re_dq = '/"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"/s';
$re_sq = "/'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'/s";
regex - ignore escaped chars within quotation marks
You want here to find an open quote and its closing one, so no escaped quote.(?<!\\)'.*?(?<!\\)'
will do so
Explanation :
(?<!
negative lookbehind\\)
escaped backslash and closing lookbehind'
the quote which has not been escaped (the negative look behind has checked it).*?
any char : .*
in lazy mode : ?
so the next quote will be evaluate(?<!\\)
again negative lookbehind to check if the quote has been escaped'
Final not escaped quote
Regex - Get strings in Quotes ignore escaped Quotes and Comments
We can use negative lookbehind if you know exacly the length of character before comment with string. Because negative lookbehind cant use quantifier. Something like this :
(?<!\/\/.)".*?[^\\]"
Or do this. Remove all comment that use //
with this regex
\/\/.*
then use this to get all strings
".*?[^\\]"
Regex pattern for matching single quoted words in a string and ignore the escaped single quotes
Without this condition, simple...
/('[^']*')/
...would suffice, of course: match all sequences of "single quote, followed by any number of non-single-quote symbols, followed by a single quote again".
But as we need to be ready for two things here - both "normal" and "escaped" ones. So we should add some spice to our pattern:
/('[^'\\]*(?:\\.[^'\\]*)*')/
It might look odd (and it is), but it's actually pretty simple too: match sequences of...
- single quote symbol...
- ...followed by zero or more "normal" characters (not
'
or\
), - ...followed by a subexpression of ("escaped" symbol, then zero or more "normal" ones), repeated 0 or more times...
- followed by a single quote symbol.
Example:
$input = "City.name = 'New \\' York (And Some Backslash Fun)\\\\'\\'";
# ...as \' in any string literal will be parsed as a _single_ quote
$pattern = "/('[^'\\\\]*(?:\\\\.[^'\\\\]*)*')/";
# ... a choice: escape either slashes or single quotes; I choose the former
preg_match($pattern, $input, $token);
echo $token[0]; // 'New \' York (And Some Backslash Fun)\\'
Extracting double quoted strings with escape sequences
If you echo
your pattern, you'll see it's indeed passed as %"(?:\"|.)*?"%
to the regex parser. The single backslash will be treated as an escape character even by the regex parser.
So you need to add at least one more backslash if the pattern is inside single quotes to pass two backslashes to the parser (one for escaping backlsash) that the pattern will be: %"(?:\\"|.)*?"%
preg_match_all('%"(?:\\\"|.)*?"%', $msg, $matches);
Still this isn't a very efficient pattern. The question seems actually a duplicate of this one.
There is a better pattern available in this answer (what some would call unrolled).
preg_match_all('%"[^"\\\]*(?:\\\.[^"\\\]*)*"%', $msg, $matches);
See demo at eval.in or compare steps with other patterns in regex101.
Regex (PHP) Remove all horizontal whitespace except between quotes ( and '') (include escaped quotes)
You may use
'~(?<!\\\\)(?:\\\\{2})*(?:"[^\\\\"]*(?:\\\\.[^"\\\\]*)*"|\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\')(*SKIP)(*F)|\h+~s'
See the regex demo
Details
(?<!\\)(?:\\{2})*(?:"[^\\"]*(?:\\.[^"\\]*)*"|'[^\\']*(?:\\.[^'\\]*)*')(*SKIP)(*F)
- a'...'
or"...."
substring where the first quotation mark is not itself escaped, which is skipped once matched (so, nothing inside them gets removed)(?<!\\)
- no\
char allowed immediately to the left of the current location(?:\\{2})*
- zero or more repetitions of double backslashes(?:"[^\\"]*(?:\\.[^"\\]*)*"|'[^\\']*(?:\\.[^'\\]*)*')
- either of the two alternatives:"[^\\"]*(?:\\.[^"\\]*)*"
- a string literal inside double quotation marks"
- a double quote[^\\"]*
- 0 or more chars other than\
and"
(?:\\.[^"\\]*)*"
- zero or more repetitions of a\
followed with any char (\\.
) and then any 0 or more chars other than"
and\
([^"\\]*
)|
- or'[^\\']*(?:\\.[^'\\]*)*'
- a string literal inside single quotation marks
(*SKIP)(*F)
- PCRE verbs that omit the found match and make the regex engine go on searching for a next match starting at the current regex index
|\h+
- or 1 or more horizontal whitespaces
PHP demo:
$strs = ['2 + 2', 'f( " ")', 'f("Test \\"mystring\\" .")', 'f("\' ", " ")'];
$rx = '~(?<!\\\\)(?:\\\\{2})*(?:"[^\\\\"]*(?:\\\\.[^"\\\\]*)*"|\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\')(*SKIP)(*F)|\h+~s';
print_r( preg_replace($rx, '', $strs) );
Output:
Array
(
[0] => 2+2
[1] => f(" ")
[2] => f("Test \"mystring\" .")
[3] => f("' "," ")
)
How can regex ignore escaped-quotes when matching strings?
<?php
$backslash = '\\';
$pattern = <<< PATTERN
#(["'])(?:{$backslash}{$backslash}?+.)*?{$backslash}1#
PATTERN;
foreach(array(
"<?php \$s = 'Hi everyone, we\\'re ready now.'; ?>",
'<?php $s = "Hi everyone, we\\"re ready now."; ?>',
"xyz'a\\'bc\\d'123",
"x = 'My string ends with with a backslash\\\\';"
) as $subject) {
preg_match($pattern, $subject, $matches);
echo $subject , ' => ', $matches[0], "\n\n";
}
prints
<?php $s = 'Hi everyone, we\'re ready now.'; ?> => 'Hi everyone, we\'re ready now.'
<?php $s = "Hi everyone, we\"re ready now."; ?> => "Hi everyone, we\"re ready now."
xyz'a\'bc\d'123 => 'a\'bc\d'
x = 'My string ends with with a backslash\\'; => 'My string ends with with a backslash\\'
Considering escaped quotes in an all characters except type regex
I would use DOMDocument
to do this as it won't care about the actual contents of the attribute as long as they are already valid:
function wrap_js($js) {
$confirm_text = "It will not be possible to modify your responses anymore if you continue.\\n\\nAre you sure you want to continue?";
$new_js_start = 'if( window.confirm("' . $confirm_text . '") ) { ';
$new_js_end = ' } else { event.preventDefault(); }';
return $new_js_start . $js . $new_js_end;
}
$html = "<input type='submit' id='gform_submit_button_4' class='gform_button button' value='Envoyer' onclick='/* Lots of JS */' onkeypress='/* Lots of JS */' />";
$doc = new DOMDocument();
$doc->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($doc);
foreach ($xpath->query("//input[@type='submit']") as $submit_input) {
foreach (['onclick', 'onkeypress'] as $attribute) {
if (($js = $submit_input->getAttribute($attribute)) != '') {
$submit_input->setAttribute($attribute, wrap_js($js));
}
}
}
echo $doc->saveHTML();
Output:
<input type="submit"
id="gform_submit_button_4"
class="gform_button button"
value="Envoyer"
onclick='if( window.confirm("It will not be possible to modify your responses anymore if you continue.\n\nAre you sure you want to continue?") ) { /* Lots of JS */ } else { event.preventDefault(); }'
onkeypress='if( window.confirm("It will not be possible to modify your responses anymore if you continue.\n\nAre you sure you want to continue?") ) { /* Lots of JS */ } else { event.preventDefault(); }'
>
Demo on 3v4l.org
Get html or text from inside quotes including escape quotes with RegEx
You can use negative lookbehind to avoid matching escaped quotes:
(?<!\\)"(.+?)(?<!\\)"
RegEx Demo
Here (?<!\\)
is negative lookbehind that will avoid matching \"
.
However I would caution you on using regex to parse HTML, better to use DOM for that.
PHP Code:
$value_regex = '~(?<!\\\\)"(.+?)(?<!\\\\)"~';
if (preg_match($value_regex, $line, $matches))
$result = $matches[1];
Related Topics
How to Detect Search Engine Bots With PHP
What Does PHP Keyword 'Var' Do
How to Add Http:// If It Doesn't Exist in the Url
List All the Files and Folders in a Directory With PHP Recursive Function
Sorting an Associative Array in PHP
PHP String Replace Match Whole Word
How to Define a Class Property Value Dynamically in PHP
MySQLi::Query(): Couldn't Fetch MySQLi
Parse Youtube Video Id Using Preg_Match
Nginx Serves .PHP Files as Downloads, Instead of Executing Them
How to Check Whether an Array Is Empty Using PHP
Simple PHP Post-redirect-get Code Example
How Safe Are PHP Session Variables