Extra backslash needed in PHP regexp pattern
You need 4 backslashes to represent 1 in regex because:
- 2 backslashes are used for unescaping in a string (
"\\\\" -> \\
) - 1 backslash is used for unescaping in the regex engine (
\\ -> \
)
From the PHP doc,
escaping any other character will result in the backslash being printed too1
Hence for \\\[
,
- 1 backslash is used for unescaping the
\
, one stay because\[
is invalid ("\\\[" -> \\[
) - 1 backslash is used for unescaping in the regex engine (
\\[ -> \[
)
Yes it works, but not a good practice.
Backslash in Regex- PHP
The backslash has a special meaning in both regexen and PHP. In both cases it is used as an escape character. For example, if you want to write a literal quote character inside a PHP string literal, this won't work:
$str = ''';
PHP would get "confused" which '
ends the string and which is part of the string. That's where \
comes in:
$str = '\'';
It escapes the special meaning of '
, so instead of terminating the string literal, it is now just a normal character in the string. There are more escape sequences like \n
as well.
This now means that \
is a special character with a special meaning. To escape this conundrum when you want to write a literal \
, you'll have to escape literal backslashes as \\
:
$str = '\\'; // string literal representing one backslash
This works the same in both PHP and regexen. If you want to write a literal backslash in a regex, you have to write /\\/
. Now, since you're writing your regexen as PHP strings, you need to double escape them:
$regex = '/\\\\/';
One pair of \\
is first reduced to one \
by the PHP string escaping mechanism, so the actual regex is /\\/
, which is a regex which means "one backslash".
Right way to escape backslash [ \ ] in PHP regex?
The thing is, you're using a character class, []
, so it doesn't matter how many literal backslashes are embedded in it, it'll be treated as a single backslash.
e.g. the following two regexes:
/[a]/
/[aa]/
are for all intents and purposes identical as far as the regex engine is concerned. Character classes take a list of characters and "collapse" them down to match a single character, along the lines of "for the current character being considered, is it any of the characters listed inside the []
?". If you list two backslashes in the class, then it'll be "is the char a blackslash or is it a backslash?".
Use preg_replace() to add two backslashes before each match
Welcome to the joys of "leaning toothpick syndrome" - backslash is such a commonly used escape character that it frequently requires escaping multiple times. Let's have a look at your case:
- Required output (presumably because of some other escaping context):
\\
- Escape each
\
with an additional\
for use in the PCRE regex engine:\\\\
- Escape each
\
there for use in a PHP string:\\\\\\\\
$value = 'mercedes-benz';
$pattern = '/(\+|-|\/|&&|\|\||!|\(|\)|\{|}|\[|]|\^|"|~|\*|\?|:|\\\)/';
$replace = '\\\\\\\\${1}';
echo preg_replace($pattern, $replace, $value);
As mickmackusa points out, you can get away with six rather than eight backslashes in some cases, such as a replacement of '\\\\\\'
; this works because the regex engine sees \\\
, which is an escaped backslash (\\
) followed by a single backslash (\
) that can't be escaping anything because it's the end of the string. Simply doubling for each "layer" of escaping is probably safer than learning when this short-cut is and isn't valid, though.
Which symbols should be escaped with a backslash in php regex?
There is a list of special Regex characters in the PHP documentation here: http://php.net/manual/en/function.preg-quote.php
The special regular expression characters are:
. \ + * ? [ ^ ] $ ( ) { } = ! < > | : -
why 3 backslash equal 4 backslash in php?
$b='/\\\\/';
php parses the string literal (more or less) character by character. The first input symbol is the forward slash. The result is a forward slash in the result (of the parsing step) and the input symbol (one character, the /) is taken away from the input.
The next input symbol is a backslash. It's taken from the input and the next character/symbol is inspected. It's also a backslash. That's a valid combination, so the second symbol is also taken from the input and the result is a single blackslash (for both input symbols).
The same with the third and fourth backslash.
The last input symbol (within the literal) is the forwardslash -> forwardslash in the result.
-> /\\/
Now for the string with three backslashes:
$a='/\\\/';
php "finds" the first blackslash, the next character is a blackslash - that's a valid combination resulting in one single blackslash in the result and both characters in the input literal taken.
php then "finds" the third blackslash, the next character is a forward-slash, this is not a valid combination. So the result is a single blackslash (because php loves and forgives you....) and only one character taken from the input.
The next input character is the forward-slash, resulting in a forwardslash in the result.
-> /\\/
=> both literals encode the same string.
php replace group of double backslash
A literal backslash in PHP single-quoted strings must be declared with 2 backslashes: to print 1|2\|2|3\\|4\\\|4
you need $str = '1|2\\|2|3\\\\|4\\\\\\|4';
.
In a regex, the literal backslash can be matched with 4 backslashes.
Here is an updated PHP code:
$str = '1|2\\|2|3\\\\|4\\\\\\|4';
// echo $str . PHP_EOL; => 1|2\|2|3\\|4\\\|4
$r = preg_split('~\\\\.(*SKIP)(*FAIL)|\\|~s', $str);
var_dump($r);
Result:
array(4) {
[0]=>
string(1) "1"
[1]=>
string(4) "2\|2"
[2]=>
string(3) "3\\"
[3]=>
string(6) "4\\\|4"
}
And to obtain **a
from \\a
you can thus use
$str = '\\\\a';
$r = preg_replace('~\\\\~s', '*', $str);
See another demo
How to properly escape a backslash to match a literal backslash in single-quoted and double-quoted PHP regex patterns
A backslash character (\
) is considered to be an escape character by both PHP's parser and the regular expression engine (PCRE). If you write a single backslash character, it will be considered as an escape character by PHP parser. If you write two backslashes, it will be interpreted as a literal backslash by PHP's parser. But when used in a regular expression, the regular expression engine picks it up as an escape character. To avoid this, you need to write four backslash characters, depending upon how you quote the pattern.
To understand the difference between the two types of quoting patterns, consider the following two var_dump()
statements:
var_dump('~\\\~');
var_dump("~\\\\~");
Output:
string(4) "~\\~"
string(4) "~\\~"
The escape sequence \~
has no special meaning in PHP when it's used in a single-quoted string. Three backslashes do also work because the PHP parser doesn't know about the escape sequence \~
. So \\
will become \
but \~
will remain as \~
.
Which one should you use:
For clarity, I'd always use ~\\\\~
when I want to match a literal backslash. The other one works too, but I think ~\\\\~
is more clear.
Related Topics
Directory Index Forbidden by Options Directive
How to Set Utf-8 Encoding for a PHP File
How to Access PHP Session Variables from Jquery Function in a .Js File
Access a File Which Is Located Before/Outside the Server Root Directory
Php: Read Specific Line from File
Why am I Getting Mime-Type of .CSV File as "Application/Octet-Stream"
Access Variables from Parent Scope in Anonymous PHP Function
Php's JSON_Encode Does Not Escape All JSON Control Characters
Escape String to Use in Mail()
Namespace in PHP Codeigniter Framework
Magic _Get Getter for Static Properties in PHP
Remove Xml Version Tag When a Xml Is Created in PHP
Fastest Way to Retrieve a <Title> in PHP
PHP Fatal Error: Call to Undefined Function MySQLi_Stmt_Get_Result()
Best Methods to Clean Up a Hacked Site with No Clean Version Available
Authentication on Google: Oauth2 Keeps Returning 'Invalid_Grant'