re.findall behaves weird
s = r'abc123d, hello 3.1415926, this is my book'
print re.findall(r'-?[0-9]+(?:\.[0-9]*)?|-?\.[0-9]+',s)
You dont need to escape twice when you are using raw mode.
Output:['123', '3.1415926']
Also the return type will be a list of strings. If you want return type as integers and floats use map
import re,ast
s = r'abc123d, hello 3.1415926, this is my book'
print map(ast.literal_eval,re.findall(r'-?[0-9]+(?:\.[0-9]*)?|-?\.[0-9]+',s))
Output: [123, 3.1415926]
Not able to understand behavior of pattern.findall() in python
This is because you're using a group (wo)?
so findall
returns what matches this group:
''
forbatman
'wo'
forbatwoman
You may use a non-matching group
: pattern = re.compile(r'bat(?:wo)?man')
re.findall()
: return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.
Regex behaving weird when finding floating point strings
As you guessed correctly, this has to do with capturing groups. According to the documentation for re.findall
:
If one or more groups are present in the pattern, return a list of groups
Therefore, you need to make all your groups ()
non-capturing using the (?:)
specifier. If there are no captured groups, it will return the entire match:
>>> pattern = r'(?:\d*\.)?\d+'
>>> findall(pattern, s)
['7.95', '10']
Strange regex issue using findall() and search()
From the findall docs
If one or more groups are present in the pattern, return a list of
groups; this will be a list of tuples if the pattern has more than one
group.
In you regex you have a capturing group (/\d{1,2})?
You could make it a non capturing group instead (?:/\d{1,2})?
Your regex would look like:
\w{2}\d/\d{1,2}(?:/\d{1,2})?
import re
port = "Gi1/0/1 Fa0/1"
search = re.findall(r'\w{2}\d/\d{1,2}(?:/\d{1,2})?', port)
print search
Demo
python - regex why does `findall` find nothing, but `search` works?
When you have capture groups (wrapped with parenthesis) in the regex, findall
will return the match of the captured group; And in your case the captured group matches an empty string; You can make it non capture with ?:
if you want to return the whole match; re.search
ignores capture groups on the other hand. These are reflected in the documentation:
re.findall:
Return all non-overlapping matches of pattern in string, as a list of
strings. The string is scanned left-to-right, and matches are returned
in the order found. If one or more groups are present in the pattern,
return a list of groups; this will be a list of tuples if the pattern
has more than one group.
re.search:
Scan through string looking for the first location where the regular
expression pattern produces a match, and return a corresponding
MatchObject instance. Return None if no position in the string matches
the pattern; note that this is different from finding a zero-length
match at some point in the string.
import re
reg = re.compile(r'^\d{1,3}(?:,\d{3})*$')
s = '42'
reg.search(s).group()
# '42'
reg.findall(s)
# ['42']
Unexpected re.findall output
re.findall
: If one or more groups are present in the pattern, return a list of groups.
You should replace Agent (\w)\w*
by (Agent \w)\w*
in case you keep the structure of the regex. If not, you only use Agent \w
.
I also tried to test results on python.
import re
#case1
print("Case 1")
string = "Agent Alice gave the secret documents to Agent Bob."
regex = '(Agent \w)\w*'
match = re.findall(regex, string)
print(match)
#case2
print("Case 2")
string = "Agent Alice gave the secret documents to Agent Bob."
regex = 'Agent \w'
match = re.findall(regex, string)
print(match)
Result
Case 1
['Agent A', 'Agent B']
Case 2
['Agent A', 'Agent B']
Understanding findall() regex result
The result from findall
corresponds to the parentheses in your regular expression. The longer result string corresponds to the first (outer) parentheses, and the second, to whatever matched the inner parentheses in the last iteration.
If you don't want that, use non-capturing parentheses (?:F|B)
- or in the case where you just match one out of a set of single characters, a character class [FB]
.
You can exploit this to check your conditions and partition the string in one go;
matches = re.findall(r'^([BF]{7})([LR]{3})$', your_string)
Related Topics
Are a Wsgi Server and Http Server Required to Serve a Flask App
How to Pass Arguments to a Button Command in Tkinter
What's the Pythonic Way to Use Getters and Setters
Why Does Concatenation of Dataframes Get Exponentially Slower
Renaming Column Names in Pandas
Does Pandas Iterrows Have Performance Issues
Why Does On_Message Stop Commands from Working
Unicodedecodeerror: 'Utf8' Codec Can't Decode Byte 0X9C
Understanding Generators in Python
How to Protect Python Code from Being Read by Users
Reverse/Invert a Dictionary Mapping
Iterating Over Dictionaries Using 'For' Loops
What's the Canonical Way to Check For Type in Python
Get Difference Between Two Lists
What Do _Init_ and Self Do in Python
Does Python Support Short-Circuiting
Selenium Using Python - Geckodriver Executable Needs to Be in Path