Python Regex Match or Operator

Python regex match OR operator

Use a non capturing group (?: and reference to the match group.

Use re.I for case insensitive matching.

import re

def find_t(text):
return re.search(r'\d{2}:\d{2}(?:am|pm)', text, re.I).group()

You can also use re.findall() for recursive matching.

def find_t(text):
return re.findall(r'\d{2}:\d{2}(?:am|pm)', text, re.I)

See demo

Using the Regex OR operator to accommodate user input of A or An

You can use:

regex_a = re.compile("(a|an)$")

That way you are telling the regex that the string needs to end right there for a match.

The regex ("(a\s|an\s)") would not work never because it expects the substrings 'a ' and 'an ' to match, and the problem is that the split() in secondword = secrets['text'].split()[1] returns whitespace-trimmed strings.

how does the OR operator (|) works in regular expression?

You are mixing Python syntax (the or boolean and | bitwise OR operators) with regex syntax.

While regular expressions do use | to separate alternate patterns, the syntax used in regular expressions is distinct and separate from Python operators. You can't arbitrarily combine the two. Regular expression syntax is passed to the re module functions via strings, not as Python expressions.

This works:

either = r"({}|{})\.(\d+)".format(date28, date29)
res = re.search(either, date)

because the regular expression pattern is combined into a single string using regular expression syntax first.

Note that there is no point in using date28 here, because everything that date28 can match, can also be matched by date29. Moreover, date28 won't match 02.19., a valid date in February.

If you want to construct a regex from 'labelled' components, I recommend you use the re.VERBOSE flag, which causes whitespace in a regex (including newlines) to be ignored, and adds support for using # ... comments. To match whitespace, use explicit classes such as [ ], [\n], \s, etc. I often combine this with explicit group names too.

E.g. your expression could be written out as:

february_date = re.compile(
r"""
(
02\. # month, always February
( # Leap year
0[1-9] # first 9 days
|
[12][0-9] # remainder from 10 to 29
)
|
02\.
( # regular year
0[1-9] # first 9 days
|
[12][0-8] # remainder 10-18, 20-28
)
)
\.(\d+) # The year
""", flags=re.VERBOSE)
res = february_date.search(date)

This format also makes it much easier to see that you are matching 02\. at the start in either pattern, which is rather redundant, and the above pattern of course still has the issue with [12][0-8] both being redundant against [12][0-9] and not actually matching the 19th of February.

Personally, I'd just use \d{2}\.\d{2}\.\d{4} and then use datetime.strptime() to validate that the matched text is actually a valid date. Building a regex to validate dates is a mammoth task, and simply not worth the effort.

For example, the pattern you tried to construct doesn't tell you that 2001 was not a leap year, so 02.29.2001 is not a valid date. But trying to parse it using datetime.strptime() throws an exception, telling you this isn't a valid date:

>>> from datetime import datetime
>>> date = '02.29.2001'
>>> datetime.strptime(date, "%m.%d.%Y")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/_strptime.py", line 458, in _strptime
datetime_date(year, 1, 1).toordinal() + 1
ValueError: day is out of range for month

Python regex with an OR condition

You can just add the search for £ into your existing regex:

for item in soup.find_all("td", {"width": "10%"}, string=re.compile(r'^(\d{4}|.*£.*)$')):

I've assumed there can be other characters than £ in the string, if that is not the case (it is simply £) then remove the .* parts of the alternation i.e.

for item in soup.find_all("td", {"width": "10%"}, string=re.compile(r'^(\d{4}|£)$')):

Python: How to use RegEx in an if statement?

import re
if re.match(regex, content):
blah..

You could also use re.search depending on how you want it to match.

You can run this example:

"""
very nive interface to try regexes: https://regex101.com/
"""
# %%
"""Simple if statement with a regex"""
import re

regex = r"\s*Proof.\s*"
contents = ['Proof.\n', '\nProof.\n']
for content in contents:
assert re.match(regex, content), f'Failed on {content=} with {regex=}'
if re.match(regex, content):
print(content)

Python regular expression using the OR operator

Python regular expressions uses the | operator for alternation.

def series2string(myserie) :
myserie2 = '|'.join(serie for serie in myserie)
myserie2 = '(' + myserie2 + ')'
return myserie2

More information: https://docs.python.org/3/library/re.html


The individual patterns look really messy, so I don't know what is a mistake, and what is intentional. I am guessing you are looking for the word "vu" in a few different contexts.

  1. Always use Python raw strings for regular expressions, prefixed with r (r'pattern here'). It allows you to use \ in a pattern without python trying to interpret it as a string escape. It is passed directly to the regex engine. (ref)
  2. Use \s to match white-space (spaces and line-breaks).
  3. Since you already have several alternative patterns, don't make ( and ) optional. It can result in catastrophic backtracking, which can make matching large strings really slow.

    \(?\(
    \)?\)
  4. {1} doesn't do anything. It just repeats the previous sub-pattern once, which is the same as not specifying anything.
  5. \br is invalid. It is interpreted as \b (ASCII bell-character) + the letter r.
  6. You have a quote character (') at the beginning of your text-string. Either you intend ^ to match the start of any line, or the ' is a copy/paste error.
  7. Some errors when combining the patterns:

    pattern = [pattern1, pattern2, pattern3, pattern4]
    pattern = series2string(pattern)

    expression(re.compile(pattern), text)

How to find logical 'or' operator (||) from a sequence using Regex Python

You need to use escape sequence '\' here since '|' is a special character.

Python docs:

'|'

A|B, where A and B can be arbitrary REs, creates a regular
expression that will match either A or B. An arbitrary number of REs
can be separated by the '|' in this way. This can be used inside
groups (see below) as well. As the target string is scanned, REs
separated by '|' are tried from left to right.

So you need to do :

expr = re.sub(r"\|\|","_or_",sequence)

Or, using re.escape() : thanks to @Steven for pointing this out

expr = re.sub(re.escape("||"),"_or_",sequence)

And you will get :

IN : sequence = "if(ab||2) + H) then a*10"
OUT : 'if(ab_or_2) + H) then a*10'

Edit :

If you are not required to just use regex, you can directly use replace for the string. ie,

sequence.replace('||','_or_')

Here you won't have to worry about the special character.

Regex for comparison operators

You can use [<>]=?|== to match the operator, and \d+ to match the number. Enclosing each of those patterns in a capture group will let you access the matched values:

>>> re.match(r'([<>]=?|==)(\d+)', '>2:').groups()
('>', '2')
>>> re.match(r'([<>]=?|==)(\d+)', '<=0:').groups()
('<=', '0')

You can also unpack the matched groups into individual variables:

match = re.match(r'([<>]=?|==)(\d+)', your_input)
operator, number = match.groups()


Related Topics



Leave a reply



Submit