Python regex match OR operator
Use a non capturing group (?:
and reference to the match group.
Use re.I
for case insensitive matching.
import re
def find_t(text):
return re.search(r'\d{2}:\d{2}(?:am|pm)', text, re.I).group()
You can also use re.findall()
for recursive matching.
def find_t(text):
return re.findall(r'\d{2}:\d{2}(?:am|pm)', text, re.I)
See demo
Using the Regex OR operator to accommodate user input of A or An
You can use:
regex_a = re.compile("(a|an)$")
That way you are telling the regex that the string needs to end right there for a match.
The regex ("(a\s|an\s)")
would not work never because it expects the substrings 'a '
and 'an '
to match, and the problem is that the split()
in secondword = secrets['text'].split()[1]
returns whitespace-trimmed strings.
how does the OR operator (|) works in regular expression?
You are mixing Python syntax (the or
boolean and |
bitwise OR operators) with regex syntax.
While regular expressions do use |
to separate alternate patterns, the syntax used in regular expressions is distinct and separate from Python operators. You can't arbitrarily combine the two. Regular expression syntax is passed to the re
module functions via strings, not as Python expressions.
This works:
either = r"({}|{})\.(\d+)".format(date28, date29)
res = re.search(either, date)
because the regular expression pattern is combined into a single string using regular expression syntax first.
Note that there is no point in using date28
here, because everything that date28
can match, can also be matched by date29
. Moreover, date28
won't match 02.19.
, a valid date in February.
If you want to construct a regex from 'labelled' components, I recommend you use the re.VERBOSE
flag, which causes whitespace in a regex (including newlines) to be ignored, and adds support for using # ...
comments. To match whitespace, use explicit classes such as [ ]
, [\n]
, \s
, etc. I often combine this with explicit group names too.
E.g. your expression could be written out as:
february_date = re.compile(
r"""
(
02\. # month, always February
( # Leap year
0[1-9] # first 9 days
|
[12][0-9] # remainder from 10 to 29
)
|
02\.
( # regular year
0[1-9] # first 9 days
|
[12][0-8] # remainder 10-18, 20-28
)
)
\.(\d+) # The year
""", flags=re.VERBOSE)
res = february_date.search(date)
This format also makes it much easier to see that you are matching 02\.
at the start in either pattern, which is rather redundant, and the above pattern of course still has the issue with [12][0-8]
both being redundant against [12][0-9]
and not actually matching the 19th of February.
Personally, I'd just use \d{2}\.\d{2}\.\d{4}
and then use datetime.strptime()
to validate that the matched text is actually a valid date. Building a regex to validate dates is a mammoth task, and simply not worth the effort.
For example, the pattern you tried to construct doesn't tell you that 2001 was not a leap year, so 02.29.2001
is not a valid date. But trying to parse it using datetime.strptime()
throws an exception, telling you this isn't a valid date:
>>> from datetime import datetime
>>> date = '02.29.2001'
>>> datetime.strptime(date, "%m.%d.%Y")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/_strptime.py", line 458, in _strptime
datetime_date(year, 1, 1).toordinal() + 1
ValueError: day is out of range for month
Python regex with an OR condition
You can just add the search for £
into your existing regex:
for item in soup.find_all("td", {"width": "10%"}, string=re.compile(r'^(\d{4}|.*£.*)$')):
I've assumed there can be other characters than £
in the string, if that is not the case (it is simply £
) then remove the .*
parts of the alternation i.e.
for item in soup.find_all("td", {"width": "10%"}, string=re.compile(r'^(\d{4}|£)$')):
Python: How to use RegEx in an if statement?
import re
if re.match(regex, content):
blah..
You could also use re.search
depending on how you want it to match.
You can run this example:
"""
very nive interface to try regexes: https://regex101.com/
"""
# %%
"""Simple if statement with a regex"""
import re
regex = r"\s*Proof.\s*"
contents = ['Proof.\n', '\nProof.\n']
for content in contents:
assert re.match(regex, content), f'Failed on {content=} with {regex=}'
if re.match(regex, content):
print(content)
Python regular expression using the OR operator
Python regular expressions uses the |
operator for alternation.
def series2string(myserie) :
myserie2 = '|'.join(serie for serie in myserie)
myserie2 = '(' + myserie2 + ')'
return myserie2
More information: https://docs.python.org/3/library/re.html
The individual patterns look really messy, so I don't know what is a mistake, and what is intentional. I am guessing you are looking for the word "vu" in a few different contexts.
- Always use Python raw strings for regular expressions, prefixed with
r
(r'pattern here'
). It allows you to use\
in a pattern without python trying to interpret it as a string escape. It is passed directly to the regex engine. (ref) - Use
\s
to match white-space (spaces and line-breaks). - Since you already have several alternative patterns, don't make
(
and)
optional. It can result in catastrophic backtracking, which can make matching large strings really slow.\(?
→\(
\)?
→\)
{1}
doesn't do anything. It just repeats the previous sub-pattern once, which is the same as not specifying anything.\br
is invalid. It is interpreted as\b
(ASCII bell-character) + the letterr
.- You have a quote character (
'
) at the beginning of your text-string. Either you intend^
to match the start of any line, or the'
is a copy/paste error. Some errors when combining the patterns:
pattern = [pattern1, pattern2, pattern3, pattern4]
pattern = series2string(pattern)
expression(re.compile(pattern), text)
How to find logical 'or' operator (||) from a sequence using Regex Python
You need to use escape sequence
'\
' here since '|
' is a special character.
Python docs:
'|'
A|B, where A and B can be arbitrary REs, creates a regular
expression that will match either A or B. An arbitrary number of REs
can be separated by the '|' in this way. This can be used inside
groups (see below) as well. As the target string is scanned, REs
separated by '|' are tried from left to right.
So you need to do :
expr = re.sub(r"\|\|","_or_",sequence)
Or, using re.escape()
: thanks to @Steven for pointing this out
expr = re.sub(re.escape("||"),"_or_",sequence)
And you will get :
IN : sequence = "if(ab||2) + H) then a*10"
OUT : 'if(ab_or_2) + H) then a*10'
Edit :
If you are not required to just use regex
, you can directly use replace
for the string. ie,
sequence.replace('||','_or_')
Here you won't have to worry about the special character.
Regex for comparison operators
You can use [<>]=?|==
to match the operator, and \d+
to match the number. Enclosing each of those patterns in a capture group will let you access the matched values:
>>> re.match(r'([<>]=?|==)(\d+)', '>2:').groups()
('>', '2')
>>> re.match(r'([<>]=?|==)(\d+)', '<=0:').groups()
('<=', '0')
You can also unpack the matched groups into individual variables:
match = re.match(r'([<>]=?|==)(\d+)', your_input)
operator, number = match.groups()
Related Topics
Problem with Multi Threaded Python App and Socket Connections
Collision Between Masks in Pygame
Date Ticks and Rotation in Matplotlib
How to I Lazily Read Multiple JSON Values from a File/Stream in Python
Can Python Pickle Lambda Functions
How to Draw Intersecting Planes
How to Compare Multiple Variables to the Same Value
Plotting a Decision Boundary Separating 2 Classes Using Matplotlib's Pyplot
How to Filter Rows in Pandas by Regex
How to Source Virtualenv Activate in a Bash Script
How to Kill a Process on Windows from Within Python
Why in Numpy 'Nan == Nan' Is False While Nan in [Nan] Is True