How can I find all matches to a regular expression in Python?
Use re.findall
or re.finditer
instead.
re.findall(pattern, string)
returns a list of matching strings.
re.finditer(pattern, string)
returns an iterator over MatchObject
objects.
Example:
re.findall( r'all (.*?) are', 'all cats are smarter than dogs, all dogs are dumber than cats')
# Output: ['cats', 'dogs']
[x.group() for x in re.finditer( r'all (.*?) are', 'all cats are smarter than dogs, all dogs are dumber than cats')]
# Output: ['all cats are', 'all dogs are']
Python regex find all matches
You are slightly overcomplicating your regex by misusing the .
which matches any character while not actually needing it and using a capturing group ()
without really using it.
With your pattern you are looking for a number in scientific notation which has to be BOTH preceded and followed by exactly one character.
{8.25e+07|8.26206e+07}
[--------]
After re.findall
traverses your string from the beginning it finds your defined pattern, which then drops the {
and the |
because of your capturing group (..)
and saves this as a match. It then continues but only has 8.26206e+07}
left. That now does not satisfy your pattern, because it is missing one "any" character for your first .
, and no further match is found. Note that findall
only looks for non-overlapping matches
[1].
To illustrate, change your input string by duplicating your separator |
:
>>> p = ".([0-9]+\.[0-9]+[eE][-+]?[0-9]+)."
>>> s = "{8.25e+07||8.26206e+07}"
>>> print(re.findall(p, s))
['8.25e+07', '8.26206e+07']
To satisfy your two .
s you need two separators between any two numbers.
Two things I would change in your pattern, (1) remove the .
s and (2) remove your capturing group (
)
, you have no need for it:
p = "[0-9]+\.[0-9]+[eE][-+]?[0-9]+"
Capturing groups can be very useful if you need to refer to specific captured groups again later, but your task at hand has no need for them.
[1] https://docs.python.org/2/library/re.html?highlight=findall#re.findall
finding all matches in a string using regex
You can try (?=(aa))
The trick is that you use positive lookahead, which doesn't consume string, this way engine starts matching at the next position in string, not after last matched text.
You will get 3 matches and each will have aa
in first captuirng group.
Demo
Regular expression to return all match occurrences
The issue is with the regular expression used.
The (.*)
blocks are accepting more of the string than you realize - .*
is referred to as a greedy operation and it will consume as much of the string as it can while still matching. This is why you only see one output.
Suggest matching something like Vacation Allowance:\s*\d+;
or similar.
text = '02/05/2020 Vacation Allowance: 21; 02/05/2020 Vacation Allowance: 22; nnn'
m = re.findall('Vacation Allowance:\s*(\d*);', text, re.M)
print(m)
result: ['21', '22']
Python regex get all matches all with findall
Here is one approach
>>> regex = re.compile("(?<=\[)([0-9]){1}?(?=\])")
>>> string = 'start asf[2]+asdfsa[0]+fsad[1]'
>>> re.findall(regex, string)
['2', '0', '1']
DEMO
>>> import re
>>> def get_all_integers_between_square_brackets(*, regex, string):
... return map(int, re.findall(regex, string))
...
>>> regex = re.compile("(?<=\[)([0-9]){1}?(?=\])")
>>> integers = get_all_integers_between_square_brackets(
regex=regex ,
string='start asf[2]+asdfsa[0]+fsad[1]'
)
>>> list(integers)
[2, 0, 1]
>>> integers = get_all_integers_between_square_brackets(
regex=regex,
string='start asf[hello]+asdfsa[world]+fsad[1][2][]')
>>> list(integers)
[1, 2]
python) find all matches using regex (changed to re.findall from re.search)
Change this:
matchexh = re.search(r'Exhibit (\d+).(\d+)',text1).group().strip()
to:
matchexh = re.findall(r'Exhibit (\d+).(\d+)',text1)
Python - Using regex to find multiple matches and print them out
Do not use regular expressions to parse HTML.
But if you ever need to find all regexp matches in a string, use the findall
function.
import re
line = 'bla bla bla<form>Form 1</form> some text...<form>Form 2</form> more text?'
matches = re.findall('<form>(.*?)</form>', line, re.DOTALL)
print(matches)
# Output: ['Form 1', 'Form 2']
How to find all matches with a regex where part of the match overlaps
The (\w+\.\s){1,2}
pattern contains a repeated capturing group, and Python re
does not store all the captures it finds, it only saves the last one into the group memory buffer. At any rate, you do not need the repeated capturing group because you need to extract multiple occurrences of the pattern from a string, and re.finditer
or re.findall
will do that for you.
Also, the re.MULTILINE
flag is not necessar here since there are no ^
or $
anchors in the pattern.
You may get the expected results using
import re
test_str = 'ali. veli. ahmet.'
src = re.findall(r'(?=\b(\w+\.\s+\w+))', test_str)
print(src)
# => ['ali. veli', 'veli. ahmet']
See the Python demo
The pattern means
(?=
- start of a positive lookahead\b
- a word boundary (crucial here, it is necessary to only start capturing at word boundaries)(\w+\.\s+\w+)
- Capturing group 1: 1+ word chars,.
, 1+ whitespaces and 1+ word chars
)
- end of the lookahead.
Python Regex - How to Get Positions and Values of Matches
import re
p = re.compile("[a-z]")
for m in p.finditer('a1b2c3d4'):
print(m.start(), m.group())
Related Topics
How to Properly Determine the Current Script Directory
Converting Unix Timestamp String to Readable Date
How to Convert All Strings in a List of Lists to Integers
How to Get the Source Code of a Python Function
"Pip Install Unroll": "Python Setup.Py Egg_Info" Failed With Error Code 1
How to Print a Single Backslash
Using @Property Versus Getters and Setters
How to Step Through Python Code to Help Debug Issues
How to Check If a String Represents an Int, Without Using Try/Except
How to Fix "Runtimeerror: Package Fails to Pass a Sanity Check" For Numpy and Pandas
Prevent Scientific Notation in Matplotlib.Pyplot
How to Uninstall Python 2.7 on a MAC Os X 10.6.4
How to Change the Datetime Format in Pandas
Urllib and "Ssl: Certificate_Verify_Failed" Error
How to Translate an Iso 8601 Datetime String into a Python Datetime Object