What does [\S\s]* mean in regex in PHP?
By default .
doesn't match new lines - [\s\S]
is a hack around that problem.
This is common in JavaScript, but in PHP you can use the /s
flag to to make the dot match all characters.
Is [\s\S] same as . (dot)?
"No" it is not the same. It has an important difference if you are not using the single line
flag (meaning that .
does not match all).
The [\s\S]
comes handy when you want to do mix of matches when the .
does not match all.
It is easier to explain it with an example. Suppose you want to capture whatever is between a
and b
, so you can use pattern a(.*?)b
(?
is for ungreedy matches and parentheses for capturing the content), but if there are new lines suppose you don't want to capture this in the same group, so you can have another regex like a([\s\S]*?)b
.
Therefore if we create one pattern using both approaches it results in:
a(.*)b|a([\s\S]*?)b
In this case, if you see the scenario in regex101, then you will have a colorful and easy way to differentiate the scenarios (in green capturing group #1 and in red capturing group #2):
So, in conclusion, the [\s\S]
is a regex trick when you want to match multiple lines and the .
does not suit your needs. It basically depends on your use case.
However, if you use the single line
flag where .
matches new lines, then you don't need the regex trick, below you can see that all is green and group 2 (red above) is not matched:
Have also created a javascript performance test and it impacts in the performance around 25%:
https://jsperf.com/ss-vs-dot
PHP Regex: Difference between \s and \\s
Trying to make sense of the text from the manual.
http://php.net/manual/en/regexp.reference.escape.php
Single and double quoted PHP strings have special meaning of
backslash.
Thus if \ has to be matched with a regular expression \\, then "\\\\"
or '\\\\' must be used in PHP code.
I might not be correct but here I go.
When you use something like
preg_match("/\\s/", $myString)
What it is doing is converting \\ to \, which in turns make the string to be \s thus it behaves normally i.e its meaning doesn't change and the created regex is '/\s/' internally which matches "spaces"
To match \s in a string you would have to do something as follows
preg_match("/\\\\s/", $myString)
So the answer is \s or \\s in the regex string doesn't make any difference, personally I think using \s is simpler and easy to understand.
Regex to match a word, even is there are spaces between letters
This is the only way to match such words. You have to consume these spaces, otherwise you won't have a match. Actually, your pattern is the same as
t\s*e\s*s\s*t
If the word appears inside a larger string, you can consider a word boundary version:
\bt\s*e\s*s\s*t\b
NOTE: If only one whitespace is allowed in between each letter, you can use ?
quantifier instead of *
:
t\s?e\s?s\s?t
What is the meaning of [\w\-] regular expression in PHP
Regex 101
\w explained
\w match any word character [a-zA-Z0-9_]
\w\- explained
\w\-
\w match any word character [a-zA-Z0-9_]
\- matches the character - literally
Matching Email Addresses Simple, not future proof
\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}\b
How to understand the output of the PHP regex matching below?
The reason is that group 12 (^(\{)
) matches successfully, and therefore the regex engine stops. It never even gets to try to match group 16 (^([\S\s]+)
).
If you put group 16 before group 12, then it will match the entire string. However, since [\s\S]
matches any character at all (and can be abbreviated to .
when using the s
modifier), then none of the other groups will ever match.
All in all, your regex looks quite weird. I wouldn't be surprised if it contained a few bugs aside from the obvious warts like trying to put several spaces into a single character class.
Reference - What does this regex mean?
The Stack Overflow Regular Expressions FAQ
See also a lot of general hints and useful links at the regex tag details page.
Online tutorials
- RegexOne ↪
- Regular Expressions Info ↪
Quantifiers
- Zero-or-more:
*
:greedy,*?
:reluctant,*+
:possessive - One-or-more:
+
:greedy,+?
:reluctant,++
:possessive ?
:optional (zero-or-one)- Min/max ranges (all inclusive):
{n,m}
:between n & m,{n,}
:n-or-more,{n}
:exactly n - Differences between greedy, reluctant (a.k.a. "lazy", "ungreedy") and possessive quantifier:
- Greedy vs. Reluctant vs. Possessive Quantifiers
- In-depth discussion on the differences between greedy versus non-greedy
- What's the difference between
{n}
and{n}?
- Can someone explain Possessive Quantifiers to me? php, perl, java, ruby
- Emulating possessive quantifiers .net
- Non-Stack Overflow references: From Oracle, regular-expressions.info
Character Classes
- What is the difference between square brackets and parentheses?
[...]
: any one character,[^...]
: negated/any character but[^]
matches any one character including newlines javascript[\w-[\d]]
/[a-z-[qz]]
: set subtraction .net, xml-schema, xpath, JGSoft[\w&&[^\d]]
: set intersection java, ruby 1.9+[[:alpha:]]
:POSIX character classes[[:<:]]
and[[:>:]]
Word boundaries- Why do
[^\\D2]
,[^[^0-9]2]
,[^2[^0-9]]
get different results in Java? java - Shorthand:
- Digit:
\d
:digit,\D
:non-digit - Word character (Letter, digit, underscore):
\w
:word character,\W
:non-word character - Whitespace:
\s
:whitespace,\S
:non-whitespace
- Digit:
- Unicode categories (
\p{L}, \P{L}
, etc.)
Escape Sequences
- Horizontal whitespace:
\h
:space-or-tab,\t
:tab - Newlines:
\r
,\n
:carriage return and line feed\R
:generic newline php java-8
- Negated whitespace sequences:
\H
:Non horizontal whitespace character,\V
:Non vertical whitespace character,\N
:Non line feed character pcre php5 java-8 - Other:
\v
:vertical tab,\e
:the escape character
Anchors
anchor | matches | flavors |
---|---|---|
^ | Start of string | Common* |
^ | Start of line | Commonm |
$ | End of line | Commonm |
$ | End of text | Common* except javascript |
$ | Very end of string | javascript*, phpD |
\A | Start of string | Common except javascript |
\Z | End of text | Common except javascript python |
\Z | Very end of string | python |
\z | Very end of string | Common except javascript python |
\b | Word boundary | Common |
\B | Not a word boundary | Common |
\G | End of previous match | Common except javascript, python |
preg_match: and-not expression
My suggested regex does not require !
with preg_match
, and matches only a multiline string without both "lorem" and "ipsum" in any upper or lower case and as whole words:
^(?si)(?!.*?\bipsum\b.*$)(?!.*\blorem\b.*\bipsum\b|\bipsum\b.*\blorem\b.*$).*$
(?si)
sets case-insensitive and singleline modes so that .
could match a newline and match both "Lorem" and "lorem". \b
s are used to match whole words only. It will also fail a string that has ipsum
(a second condition (?!.*?\bipsum\b.*$)
is set to handle that).
See demo here
$re = "/^(?si)(?!.*?\\bipsum\\b.*$)(?!.*\\blorem\\b.*\\bipsum\\b|\\bipsum\\b.*\\blorem\\b.*$).*$/";
$str = "dolor lorem sit amet, consectetur adipisicing elit, \nsed do eiusmod tempor incididunt ut labore et dolore magna aliqua.";
if (preg_match($re, $str, $matches)) {
...
}
PHP preg_replace() does not match white spaces and new lines: why?
Add +
after <!--[\s\r\n]*
in your pattern :
$pattern = '<!--[\s\r\n]*+[^![].*-->';
$string = '<!-- one line comment -->';
var_dump(preg_replace('/' . $pattern . '/Uis', '', $string));
$string = '<!--
multiple line comment
-->';
var_dump(preg_replace('/' . $pattern . '/Uis', '', $string));
$string = '<!-- ! one line comment -->';
var_dump(preg_replace('/' . $pattern . '/Uis', '', $string));
$string = '<!--! one line comment -->';
var_dump(preg_replace('/' . $pattern . '/Uis', '', $string));
$string = '<!--!
multiple line comment
-->';
var_dump(preg_replace('/' . $pattern . '/Uis', '', $string));
$string = '<!-- !
multiple line comment
-->';
var_dump(preg_replace('/' . $pattern . '/Uis', '', $string));
$string = '<!--
!multiple line comment
-->';
var_dump(preg_replace('/' . $pattern . '/Uis', '', $string));
$string = '<!--[if lt IE 9]>';
var_dump(preg_replace('/' . $pattern . '/Uis', '', $string));
Output :
string '' (length=0)
string '' (length=0)
string '<!-- ! one line comment -->' (length=27)
string '<!--! one line comment -->' (length=26)
string '<!--!
multiple line comment
-->' (length=33)
string '<!-- !
multiple line comment
-->' (length=34)
string '<!--
!multiple line comment
-->' (length=33)
string '<!--[if lt IE 9]>' (length=17)
RegEx for capturing a repeating pattern
You can make this work with only minor changes, but the issue is that last part. The general form of a tempered greedy token is this:
(.(?!notAllowed))+
so, using this pattern for your case, plus adding named groups for clarity:
(?<hours>[0-9]{1,2}h)[ ]*(?<minutes>[0-9]{1,2}min):\s*(?<description>(?:.(?!\dh\s\d{1,2}min))+)
PS: if you cannot turn on a "dot matches newline" mode, you may be able to use [\s\S]
to simulate.
regex101 demo
Related Topics
Create Nested List from Multidimensional Array
Google API How to Connect to Receive Values from Spreadsheet
How to Use PHPunit to Test a Function If That Function Is Supposed to Kill PHP
Search and Replace Value in PHP Array
How to Execute PHP Code in a Sandbox from Within PHP
Backup a MySQL Database and Download as a File
Curl - Load a Site with Cloudflare Protection
Does Perl Have PHP-Like Dynamic Variables
How to Find Error Log Files for PHP
Displaying All Table Names in PHP from MySQL Database
How to Return Custom Error Message from Controller Method Validation
How Does {} Affect a MySQL Query in PHP
How to Use Composer to Autoload Classes from Outside the Vendor