Python Regular Expression Re.Match, Why This Code Does Not Work

re.match is not working as I expect in python3

As @snakecharmerb said, some of the backslashes get interpreted in double quotes.

import re
a = r"^(?!.*(\d)(-?\1){3})[456]{1}\d{3}[-]?\d{4}[-]?\d{4}[-]?\d{4}$"
b = "^(?!.*(\d)(-?\1){3})[456]{1}\d{3}[-]?\d{4}[-]?\d{4}[-]?\d{4}$"
print(a)
print(b)

test_case = "5133-3367-8912-3456"
print('Valid' if re.match(a,test_case) is not None else "Invalid")
print('Valid' if re.match(b,test_case) is not None else "Invalid")

output

^(?!.*(\d)(-?\1){3})[456]{1}\d{3}[-]?\d{4}[-]?\d{4}[-]?\d{4}$
^(?!.*(\d)(-?){3})[456]{1}\d{3}[-]?\d{4}[-]?\d{4}[-]?\d{4}$
Invalid
Valid

Matching strings with re.match doesn't work

This is not a regular sentence where words are joined with an underscore. Since you are just checking if the word is present, you may either remove \b (as it is matching on a word boundary and _ is a word character!) or add alternatives:

import re
my_other_string = 'the_boat_has_sunk'
my_list = ['car', 'boat', 'truck']
my_list = re.compile(r'(?:\b|_)(?:%s)(?=\b|_)' % '|'.join(my_list))
if re.search(my_list, my_other_string):
print('yay')

See IDEONE demo

EDIT:

Since you say it has to be true if one of the words in the list is in the string, not only as a separate word, but it musn't match if for example boathouse is in the string, I suggest first replacing non-word characters and _ with space, and then using the regex you had with \b:

import re
my_other_string = 'the_boathouse_has_sunk'
my_list = ['car', 'boat', 'truck']
my_other_string = re.sub(r'[\W_]', ' ', my_other_string)
my_list = re.compile(r'\b(?:%s)\b' % '|'.join(my_list))
if re.search(my_list, my_other_string):
print('yay')

This will not print yay, but if you remove house, it will.

See IDEONE Demo 2

Regex match but re.match() doesn't return anything

First of all you assign your regex pattern to a variable str (overrides built-in str), but you use featureStr afterwards. Your resulting match object is empty, because you told it to ignore, what it matched. You can assign names to the regex placeholder using ?P<name> and access them later. Here is a working example:

import re

featureStr = (
r'##(?P<title>.*)\n+##(?P<title_2>.*)\n+###(?P<first>(.*)###(?P<second>(.*)##(?P<third>(.*)##(.*)')
file_regexp = re.compile(featureStr, re.S)

fileContent = open("markdown.md").read()

m = file_regexp.match(fileContent)

print(m.groupdict())

Which prints:

{'title': ' title', 'title_2': ' title 2', 'first': ' first paragraph\n[lines]\n...\n\n', 'second': ' second\n[lines]\n...\n\n', 'third': ' third \n[lines]\n...\n\n'}

I hope this helps you. Let me know if there are any questions left. Have a nice day!

Regex re.match() not returning match when it should

re.match is going to go from the beginning of the string. Because the string does not start with a digit, nothing is going to match. You can instead use something like this:

Assuming s="[RAM] G.SKILL Ripjaws V Series 16GB (2 x 8GB) DDR4 3600mhz $69.99"

In [1]: regex = re.compile('\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2})')                                  

In [2]: regex.findall(s)
Out[2]: ['69.99']

or you need to account for whatever is in the beginning of the string and you could create a match group like so:

In [1]: regex = re.compile('.*?(?P<price>\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2}))')                    

In [2]: match = regex.match(s)

In [3]: match
Out[3]: <re.Match object; span=(0, 65), match='[RAM] G.SKILL Ripjaws V Series 16GB (2 x 8GB) DDR>

In [4]: match.group('price')
Out[4]: '69.99'

My regex works on regex101 but doesn't work in python?

re.match will want to match the string starting at the beginning. In your case, you just need the matching element, correct? In that case you can use something like re.search or re.findall, which will find that match anywhere in the string:

>>> re.search(pattern, "  |test|").group(0)
'|test|'

>>> re.findall(pattern, " |test|")
['test']

Regex match returing 'none' while findall & search work

Python offers two different primitive operations based on regular expressions: re.match() checks for a match only at the beginning of the string, while re.search() checks for a match anywhere in the string

https://docs.python.org/3/library/re.html#search-vs-match

Regular Expression in python doesn't work

the mistake is to convert bytes to string using str.

>>> str(b'foo')
"b'foo'"

You would have needed

line = line.decode()

But the best way is to pass a bytes regex to the regex, that is supported:

for line in fhand:
if re.search(b'^From',line) is not None:
sumFind+=1

now I get 54 matches.

note that you could simplify the whole loop to:

sum_find = sum(bool(re.match(b'From',line)) for line in fhand)
  • re.match replaces the need to use ^ with search
  • no need for loop, sum counts the times where re.match returns a truthy value (explicitly converted to bool so it can sum 0 or 1)

or even simpler without regex:

sum_find = sum(line.startswith(b"From") for line in fhand)


Related Topics



Leave a reply



Submit