How to Properly Escape a Backslash to Match a Literal Backslash in Single-Quoted and Double-Quoted PHP Regex Patterns

How to properly escape a backslash to match a literal backslash in single-quoted and double-quoted PHP regex patterns

A backslash character (\) is considered to be an escape character by both PHP's parser and the regular expression engine (PCRE). If you write a single backslash character, it will be considered as an escape character by PHP parser. If you write two backslashes, it will be interpreted as a literal backslash by PHP's parser. But when used in a regular expression, the regular expression engine picks it up as an escape character. To avoid this, you need to write four backslash characters, depending upon how you quote the pattern.

To understand the difference between the two types of quoting patterns, consider the following two var_dump() statements:

var_dump('~\\\~');
var_dump("~\\\\~");

Output:

string(4) "~\\~"
string(4) "~\\~"

The escape sequence \~ has no special meaning in PHP when it's used in a single-quoted string. Three backslashes do also work because the PHP parser doesn't know about the escape sequence \~. So \\ will become \ but \~ will remain as \~.

Which one should you use:

For clarity, I'd always use ~\\\\~ when I want to match a literal backslash. The other one works too, but I think ~\\\\~ is more clear.

Right way to escape backslash [ \ ] in PHP regex?

The thing is, you're using a character class, [], so it doesn't matter how many literal backslashes are embedded in it, it'll be treated as a single backslash.

e.g. the following two regexes:

/[a]/
/[aa]/

are for all intents and purposes identical as far as the regex engine is concerned. Character classes take a list of characters and "collapse" them down to match a single character, along the lines of "for the current character being considered, is it any of the characters listed inside the []?". If you list two backslashes in the class, then it'll be "is the char a blackslash or is it a backslash?".

Find the occurrence of backslash in a string

For something simple as this, you don't need a regular expression. A string function like strpos() should be enough:

if (strpos('aud\ios', '\\') !== FALSE) {
// String contains '\'
}

Note that you need to escape the backslash here. If you simply write \, then PHP considers it as an escape sequence and tries to escape the character that follows. To avoid this, you need to escape the escape using another backslash: \\.

As for matching a literal backslash using a preg_* function, you'll need to use \\\\ instead of a single \.

From the PHP manual documentation on Escape Sequences:

Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \\, then "\\\\" or '\\\\' must be used in PHP code.

So your code would look like:

preg_match('/\\\\/', $string); // Don't use this though

where:

  • / - starting delimiter
  • \\\\ - matches a single literal \
  • / - ending delimiter

For additional information about this, see:

  • How to properly escape a backslash to match a literal backslash in single-quoted and double-quoted PHP regex patterns

Extracting double quoted strings with escape sequences

If you echo your pattern, you'll see it's indeed passed as %"(?:\"|.)*?"% to the regex parser. The single backslash will be treated as an escape character even by the regex parser.

So you need to add at least one more backslash if the pattern is inside single quotes to pass two backslashes to the parser (one for escaping backlsash) that the pattern will be: %"(?:\\"|.)*?"%

preg_match_all('%"(?:\\\"|.)*?"%', $msg, $matches);

Still this isn't a very efficient pattern. The question seems actually a duplicate of this one.

There is a better pattern available in this answer (what some would call unrolled).

preg_match_all('%"[^"\\\]*(?:\\\.[^"\\\]*)*"%', $msg, $matches);

See demo at eval.in or compare steps with other patterns in regex101.

Is it really required to escape backslashes in regex patterns?

Not necessarily, because the string literal rules say that if \ is followed by anything other than another \ or a ' it is treated as any other character. This general rule also affects double-quoted strings, although in that case there are more recognized escape sequences than just these two.

You could escape it if you wanted to, but personally I think the world has enough backslashes already.

Add backslash before single and double quote

If perl is okay:

perl -pe 's/"{3}(*SKIP)(*F)|[\x27"]/\\$&/g'
  • "{3}(*SKIP)(*F) don't change triple double quotes
    • use (\x27{3}|"{3})(*SKIP)(*F) if you shouldn't change triple single/double quotes
  • |[\x27"] match single or double quotes
  • \\$& prefix \ to the matched portion

With sed, you can replace the triple quotes with newline character (since newline character cannot be present in pattern space for default line-by-line usage), then replace the single/double quote characters and then change newline characters back to triple quotes.

# assuming only triple double quotes are present
sed 's/"""/\n/g; s/[\x27"]/\\&/g; s/\n/"""/g'

Why does a single backslash and a double backslash perform the same escaping in regex?

Because \\ in PHP strings means "escape the backslash". Since \/ doesn't mean anything it doesn't need to be escaped (even though it's possible), so they evaluate to the same.

In other words, both of these will print the same thing:

echo '/008\\//i'; // prints /008\//i
echo '/008\//i'; // prints /008\//i

The backslash is one of the few characters that can get escaped in a single quoted string (aside from the obvious \'), which ensures that you can make a string such as 'test\\' without escaping last quote.

How to escape single quotes within single quoted strings

If you really want to use single quotes in the outermost layer, remember that you can glue both kinds of quotation. Example:

 alias rxvt='urxvt -fg '"'"'#111111'"'"' -bg '"'"'#111111'"'"
# ^^^^^ ^^^^^ ^^^^^ ^^^^
# 12345 12345 12345 1234

Explanation of how '"'"' is interpreted as just ':

  1. ' End first quotation which uses single quotes.
  2. " Start second quotation, using double-quotes.
  3. ' Quoted character.
  4. " End second quotation, using double-quotes.
  5. ' Start third quotation, using single quotes.

If you do not place any whitespaces between (1) and (2), or between (4) and (5), the shell will interpret that string as a one long word.

PHP preg_match function not working as expected

There are several issues with your code.

  1. If you're using single quotes for the pattern and want to match a literal backslash, you need to use at least \\\ or even \\\\ to produce an escaped backslash \\. Just echo your pattern if unsure.

  2. Instead of using the global flag g which is not available in PHP use preg_match_all. If it matches, it returns the number of matches. You can check match condition by preg_match_all(...) > 0

  3. Unsure about ^ in [\\^]. if you don't need it, drop it. Further [0-9] can be reduced to \d. Also I would add a word boundary \b after \d{4} if something like \u12345 should not be matched.

See this PHP demo at tio.run

$pattern = '/\\\u\d{4}\b/i';
# echo $pattern;

if(preg_match_all($pattern, $data['title'], $matches, PREG_OFFSET_CAPTURE) > 0){
print_r($matches[0]);
} else{
echo "Not Found";
}


Related Topics



Leave a reply



Submit