Regex for existence of some words whose order doesn't matter
See this regex:
/^(?=.*Tim)(?=.*stupid).+/
Regex explanation:^
Asserts position at start of string.(?=.*Tim)
Asserts that "Tim" is present in the string.(?=.*stupid)
Asserts that "stupid" is present in the string..+
Now that our phrases are present, this string is valid. Go ahead and use.+
or -.++
to match the entire string.
(?=.*<to_assert>)
group. The entire regex can be simplified as /^(?=.*Tim).*stupid/
.See a regex demo!
>>> import re
>>> str ="""
... Tim is so stupid.
... stupid Tim!
... Tim foobar barfoo.
... Where is Tim?"""
>>> m = re.findall(r'^(?=.*Tim)(?=.*stupid).+$', str, re.MULTILINE)
>>> m
['Tim is so stupid.', 'stupid Tim!']
>>> m = re.findall(r'^(?=.*Tim).*stupid', str, re.MULTILINE)
>>> m
['Tim is so stupid.', 'stupid Tim!']
Read more:- Regex with exclusion chars and another regex
Regex for matching adjacent words where order doesn't matter
There is no strictly better solution, but there's an alternative.
Now, if you have two normal words like "fat" and "cat", then (fat cat|cat fat)
is undoubtedly the best solution. But what if you have 5 words? Or if you have more complex patterns than just fat
and cat
that you don't want to type twice?
Say instead of fat
and cat
you have 3 regex patterns A
, B
and C
, and instead of the space between fat
and cat
you have the regex pattern S
. In that case, you could use this recipe:
(?:(?:(?!\1)()|\1(?:S))(?:(?!\2)()(?:A)|(?!\3)()(?:B)|(?!\4)()(?:C))){3}
If you don't have an S
, this can be simplified to(?:(?!\1)()(?:A)|(?!\2)()(?:B)|(?!\3)()(?:C)){3}
(Note: (?:X)
can be simplified to X
if X
doesn't contain an alternation |
.)Example
If we set A
= fat
, B
= cat
and S
= space, we get:
(?:(?:(?!\1)()|\1 )(?:(?!\2)()fat|(?!\3)()cat)){2}
Try it online.Explanation
In essence, we're using capture groups to "remember" which patterns have already matched. To do so, we use this little pattern here:
(?!\1)()some_pattern
What does this do? It's a regex that matches exactly once. Once it has matched, it won't ever match again. If you try to add a quantifier around that pattern like (?:(?!\1)()some_pattern)*
it'll match either once or won't match at all.The trick there is the usage of a backreference to a capture group before that group has even been defined. Because capture groups are initialized with a "failed to match" state, the negative lookahead (?!\1)
will match successfully - but only the first time. Because right afterwards, the capture group ()
matches and captures the empty string. From this point forward, the negative lookahead (?!\1)
will never match again.
With this as a building block, we can create a regex that matches fatcat
and catfat
while only containing the words fat
and cat
once:
(?:(?!\1)()fat|(?!\2)()cat){2}
Because of the negative lookaheads, each word can only match at most once. Adding a {2}
quantifier at the end guarantees that each of the two words matches exactly once, or the entire match fails.Now we just need to find a way to match a space between fat
and cat
. Well, that's just a slight variation of the same pattern:
(?:(?!\1)()|\1 )
This pattern will match the empty string on its first match, and on each subsequent match it'll match a space.Put it all together, and voilà:
(?:(?:(?!\1)()|\1 )(?:(?!\2)()fat|(?!\3)()cat)){2}
Templates (for the lazy)
2 patterns A
and B
, with separator S
:
(?:(?:(?!\1)()|\1(?:S))(?:(?!\2)()(?:A)|(?!\3)()(?:B))){2}
3 patterns A
, B
and C
, with separator S
:(?:(?:(?!\1)()|\1(?:S))(?:(?!\2)()(?:A)|(?!\3)()(?:B)|(?!\4)()(?:C))){3}
4 patterns A
, B
, C
and D
, with separator S
:(?:(?:(?!\1)()|\1(?:S))(?:(?!\2)()(?:A)|(?!\3)()(?:B)|(?!\4)()(?:C)|(?!\5)()(?:D))){4}
2 patterns A
and B
, without S
:(?:(?!\1)()(?:A)|(?!\2)()(?:B)){2}
3 patterns A
, B
and C
, without S
:(?:(?!\1)()(?:A)|(?!\2)()(?:B)|(?!\3)()(?:C)){3}
4 patterns A
, B
, C
and D
, without S
:(?:(?!\1)()(?:A)|(?!\2)()(?:B)|(?!\3)()(?:C)|(?!\4)()(?:D)){4}
Match a string between two or more words regardless of order
You may use a backreference + a subroutine:
\b(longword1|longword2)\b.*?\b(?!\1\b)(?1)\b
Expanding it for three alternatives:\b(longword1|longword2|longword3)\b.*?\b(?!\1\b)((?1))\b.*?\b(?!(?:\1|\2)\b)(?1)\b
See the regex demo and this regex demo, too. So, the list of words will be in Group 1, and you will only need to add backreferences before the subsequent subroutines.Details
\b(longword1|longword2)\b
- a whole word, eitherlongword1
orlongword2
.*?
- any 0 or more chars other than line break chars, as few as possible\b
- a word boundary(?!\1\b)
- there cannot be the same text as matched in Group 1 followed with a word boundary(?1)
- a subroutine that matches the same pattern as in Group 1\b
- a word boundary
JavaScript - regex order doesn't matter but existence required
Just use two sequential RegExps, like that:
var body = '<link rel="stylesheet" href="my.css"/> <link href="https://support.google.com/recaptcha/?hl=en" rel="canonical"/> <a href="https://www.google.com/accounts/TOS"/>'
var linkRegexp = /(<link[^>]*rel=['"]canonical['"][^>]*>)/;
var hrefRegexp = /href=['"](.*?)['"]/;
var linkBody = linkRegexp.exec(body)[1];
console.log(hrefRegexp.exec(linkBody)[1]);
- linkRegexp - get the link with rel='canonical'
- hrefRegexp - extract href from it
var regexp = /<link[^>]*(?=href=['"]([^'"]*)['"][^>]*?rel=['"]canonical['"]|rel=['"]canonical[^>]*?href=['"]([^'"]*)['"])[^>]*>/;
console.log( regexp.exec(body).splice(1).join(""));
(but IMHO this is much less readable) Find unordered words with RegEx
I think this task best gets done with some programming logic and regex wouldn't be easy and efficient. But here is a regex that seems to be doing your job and doesn't matter whether you have repeating words (hello my world) present or not,
\b(hello|my|world)\b.*?((?!\1)\b(?:hello|my|world)\b).*?(?:(?!\1)(?!\2)\b(?:hello|my|world)\b)
The idea here is,- Make an alternation group
\b(hello|my|world)\b
and put it in group1 - Then optionally it can have zero or more any characters following it.
- Then it must be followed by any of the remaining two words and not the one that got matched in first group which is why I have used
((?!\1)\b(?:hello|my|world)\b)
and this second match is then put in group 2. - Then again it can have optionally zero or more any characters following it.
- Then again we apply the same logic where the third word should be the one that wasn't captured in either group1 or group2, hence this regex
(?:(?!\1)(?!\2)\b(?:hello|my|world)\b)
Regex unordered matches
You don't really want to use a regex for this unless the text is very small, which from your description I doubt.
A simple solution would be to dump all the words into a HashSet, at which point checking to see if a word is present becomes a very quick and easy operation.
Regex: I want this AND that AND that... in any order
You can use (?=…)
positive lookahead; it asserts that a given pattern can be matched. You'd anchor at the beginning of the string, and one by one, in any order, look for a match of each of your patterns.
It'll look something like this:
^(?=.*one)(?=.*two)(?=.*three).*$
This will match a string that contains "one"
, "two"
, "three"
, in any order (as seen on rubular.com).Depending on the context, you may want to anchor on \A
and \Z
, and use single-line mode so the dot matches everything.
This is not the most efficient solution to the problem. The best solution would be to parse out the words in your input and putting it into an efficient set representation, etc.
Related questions
- How does the regular expression
(?<=#)[^#]+(?=#)
work?
More practical example: password validation
Let's say that we want our password to:- Contain between 8 and 15 characters
- Must contain an uppercase letter
- Must contain a lowercase letter
- Must contain a digit
- Must contain one of special symbols
^(?=.{8,15}$)(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(?=.*[!@#$%^&*]).*$
\__________/\_________/\_________/\_________/\______________/
length upper lower digit symbol
Building a RegEx, 12 letters without order, fixed number of individual letters
One possibility, although I still think regex is inappropriate for this. Checks that all letters appear the desired amount and that it's 12 letters total (so there's no room left for any more/other letters):
import re
for s in 'WSITTOBAEERH', 'HREEABOTTISW', 'WSITOTBAEREH':
print(re.fullmatch('(?=.*W)(?=.*S)(?=.*I)(?=.*O)'
'(?=.*B)(?=.*A)(?=.*R)(?=.*H)'
'(?=.*T.*T)(?=.*E.*E).{12}', s))
Another, checking that none other than T and E appear twice, that none appear thrice, and that we have only the desired letters, 12 total:import re
for s in 'WSITTOBAEERH', 'HREEABOTTISW', 'WSITOTBAEREH':
print(re.fullmatch(r'(?!.*([^TE]).*\1)'
r'(?!.*(.).*\1.*\1)'
r'[WSIOBARHTE]{12}', s))
A simpler way:for s in 'WSITTOBAEERH', 'HREEABOTTISW', 'WSITOTBAEREH':
print(sorted(s) == sorted('WSIOBARHTTEE'))
Related Topics
How to Read Class Attributes in the Same Order as Declared
How Do Threads Work in Python, and What Are Common Python-Threading Specific Pitfalls
Django: Adding "Nulls Last" to Query
Method Not Allowed Flask Error 405
Safely Create a File If and Only If It Does Not Exist with Python
Why Python Has Limit for Count of File Handles
Fitting a Closed Curve to a Set of Points
Tuple Unpacking Order Changes Values Assigned
How to Flatten a Pandas Dataframe with Some Columns as JSON
How to Pickle a Dynamically Created Nested Class in Python
How to Get a Gcp Bearer Token Programmatically with Python
Setting Variables with Exec Inside a Function
Looping from 1 to Infinity in Python
Appending to the Same List from Different Processes Using Multiprocessing