Re.Findall Behaves Weird

re.findall behaves weird

s = r'abc123d, hello 3.1415926, this is my book'
print re.findall(r'-?[0-9]+(?:\.[0-9]*)?|-?\.[0-9]+',s)

You dont need to escape twice when you are using raw mode.

Output:['123', '3.1415926']

Also the return type will be a list of strings. If you want return type as integers and floats use map

import re,ast
s = r'abc123d, hello 3.1415926, this is my book'
print map(ast.literal_eval,re.findall(r'-?[0-9]+(?:\.[0-9]*)?|-?\.[0-9]+',s))

Output: [123, 3.1415926]

Not able to understand behavior of pattern.findall() in python

This is because you're using a group (wo)? so findall returns what matches this group:

  • '' for batman
  • 'wo' for batwoman

You may use a non-matching group : pattern = re.compile(r'bat(?:wo)?man')


re.findall(): return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

Regex behaving weird when finding floating point strings

As you guessed correctly, this has to do with capturing groups. According to the documentation for re.findall:

If one or more groups are present in the pattern, return a list of groups

Therefore, you need to make all your groups () non-capturing using the (?:) specifier. If there are no captured groups, it will return the entire match:

>>> pattern = r'(?:\d*\.)?\d+'

>>> findall(pattern, s)
['7.95', '10']

Strange regex issue using findall() and search()

From the findall docs

If one or more groups are present in the pattern, return a list of
groups; this will be a list of tuples if the pattern has more than one
group.

In you regex you have a capturing group (/\d{1,2})?

You could make it a non capturing group instead (?:/\d{1,2})?

Your regex would look like:

\w{2}\d/\d{1,2}(?:/\d{1,2})?

import re
port = "Gi1/0/1 Fa0/1"
search = re.findall(r'\w{2}\d/\d{1,2}(?:/\d{1,2})?', port)
print search

Demo

python - regex why does `findall` find nothing, but `search` works?

When you have capture groups (wrapped with parenthesis) in the regex, findall will return the match of the captured group; And in your case the captured group matches an empty string; You can make it non capture with ?: if you want to return the whole match; re.search ignores capture groups on the other hand. These are reflected in the documentation:

re.findall:

Return all non-overlapping matches of pattern in string, as a list of
strings. The string is scanned left-to-right, and matches are returned
in the order found. If one or more groups are present in the pattern,
return a list of groups;
this will be a list of tuples if the pattern
has more than one group.

re.search:

Scan through string looking for the first location where the regular
expression pattern produces a match, and return a corresponding
MatchObject instance. Return None if no position in the string matches
the pattern; note that this is different from finding a zero-length
match at some point in the string.

import re
reg = re.compile(r'^\d{1,3}(?:,\d{3})*$')
s = '42'
reg.search(s).group()
​# '42'

reg.findall(s)
# ['42']

Unexpected re.findall output

re.findall: If one or more groups are present in the pattern, return a list of groups.

You should replace Agent (\w)\w* by (Agent \w)\w* in case you keep the structure of the regex. If not, you only use Agent \w.

I also tried to test results on python.

import re

#case1
print("Case 1")
string = "Agent Alice gave the secret documents to Agent Bob."
regex = '(Agent \w)\w*'

match = re.findall(regex, string)
print(match)

#case2
print("Case 2")
string = "Agent Alice gave the secret documents to Agent Bob."
regex = 'Agent \w'

match = re.findall(regex, string)
print(match)

Result

Case 1
['Agent A', 'Agent B']
Case 2
['Agent A', 'Agent B']

Understanding findall() regex result

The result from findall corresponds to the parentheses in your regular expression. The longer result string corresponds to the first (outer) parentheses, and the second, to whatever matched the inner parentheses in the last iteration.

If you don't want that, use non-capturing parentheses (?:F|B) - or in the case where you just match one out of a set of single characters, a character class [FB].

You can exploit this to check your conditions and partition the string in one go;

matches = re.findall(r'^([BF]{7})([LR]{3})$', your_string)


Related Topics



Leave a reply



Submit