Find out how many times a regex matches in a string in Python
The existing solutions based on findall
are fine for non-overlapping matches (and no doubt optimal except maybe for HUGE number of matches), although alternatives such as sum(1 for m in re.finditer(thepattern, thestring))
(to avoid ever materializing the list when all you care about is the count) are also quite possible. Somewhat idiosyncratic would be using subn
and ignoring the resulting string...:
def countnonoverlappingrematches(pattern, thestring):
return re.subn(pattern, '', thestring)[1]
the only real advantage of this latter idea would come if you only cared to count (say) up to 100 matches; then, re.subn(pattern, '', thestring, 100)[1]
might be practical (returning 100 whether there are 100 matches, or 1000, or even larger numbers).
Counting overlapping matches requires you to write more code, because the built-in functions in question are all focused on NON-overlapping matches. There's also a problem of definition, e.g, with pattern being 'a+'
and thestring being 'aa'
, would you consider this to be just one match, or three (the first a
, the second one, both of them), or...?
Assuming for example that you want possibly-overlapping matches starting at distinct spots in the string (which then would give TWO matches for the example in the previous paragraph):
def countoverlappingdistinct(pattern, thestring):
total = 0
start = 0
there = re.compile(pattern)
while True:
mo = there.search(thestring, start)
if mo is None: return total
total += 1
start = 1 + mo.start()
Note that you do have to compile the pattern into a RE object in this case: function re.search
does not accept a start
argument (starting position for the search) the way method search
does, so you'd have to be slicing thestring as you go -- definitely more effort than just having the next search start at the next possible distinct starting point, which is what I'm doing in this function.
Number of regex matches
If you know you will want all the matches, you could use the re.findall
function. It will return a list of all the matches. Then you can just do len(result)
for the number of matches.
Counting occurrences of Regex Matches in Python
Below is a modified variant of my answer at https://stackoverflow.com/a/64220148/6632736.
It is assumed that the log is in a file, which is read line by line.
#!/usr/bin/python
import os
import re
def increment(ips: dict, line: str):
match = re.match(r'^.+?\s+-\s+(?P<ip>\d{1,3}(\.\d{1,3}){3})\s.*$', line)
if match:
ip = match.group('ip')
if not ip in ips:
ips[ip] = 0
ips[ip] += 1
def parse_log_file(log: str) -> dict:
ips = dict()
with open(log, 'r') as file:
for line in file:
increment(ips, line)
return ips
# log is the path to the log file:
for key, value in parse_log_file(log).items():
print(key, ":", value)
How can I find all matches to a regular expression in Python?
Use re.findall
or re.finditer
instead.
re.findall(pattern, string)
returns a list of matching strings.
re.finditer(pattern, string)
returns an iterator over MatchObject
objects.
Example:
re.findall( r'all (.*?) are', 'all cats are smarter than dogs, all dogs are dumber than cats')
# Output: ['cats', 'dogs']
[x.group() for x in re.finditer( r'all (.*?) are', 'all cats are smarter than dogs, all dogs are dumber than cats')]
# Output: ['all cats are', 'all dogs are']
Count number of occurences of each string using regex
You may pass a callable as the replacement argument to re.sub
and collect the necessary counting details during a single replacement pass:
import re
counter = {}
def repl(m):
if m.group() in counter:
counter[m.group()] += 1
else:
counter[m.group()] = 1
return 'd'
text = "a;b o a;c a l l e d;a;c a b"
rx = re.compile(r'\b(a|b|c)\b')
result = rx.sub(repl, text)
print(counter, result, sep="\n")
See the Python demo online, output;
{'a': 5, 'b': 2, 'c': 2}
d;d o d;d d l l e d;d;d d d
Check if string matches pattern
import re
pattern = re.compile("^([A-Z][0-9]+)+$")
pattern.match(string)
How to find out the number of times group was matched in Python regular expressions?
One way you could do this is to enclose each repeating capture group inside another group, then you can divide the length of the outer match by the length of the inner match to determine how many times each inner group matched. For example:
import re
m = re.search(r'0((10)+)((20)+)', '0001010202020000')
num_grps = len(m.groups())
for i in range(1, num_grps+1,2):
outer = m.end(i) - m.start(i)
inner = m.end(i+1) - m.start(i+1)
print((m.group(i+1), outer//inner))
Output:
('10', 2)
('20', 3)
Python - Using regex to find multiple matches and print them out
Do not use regular expressions to parse HTML.
But if you ever need to find all regexp matches in a string, use the findall
function.
import re
line = 'bla bla bla<form>Form 1</form> some text...<form>Form 2</form> more text?'
matches = re.findall('<form>(.*?)</form>', line, re.DOTALL)
print(matches)
# Output: ['Form 1', 'Form 2']
Related Topics
How to Interact with the Recaptcha Audio Element Using Selenium and Python
How to Remove Specific Tag/Sticker/Object from Images Using Opencv
Gunicorn Autoreload on Source Change
How to Efficiently Handle European Decimal Separators Using the Pandas Read_CSV Function
How Does Python's "Super" Do the Right Thing
Pyeval_Initthreads in Python 3: How/When to Call It? (The Saga Continues Ad Nauseam)
How to Unimport a Python Module Which Is Already Imported
Python - Using the Multiply Operator to Create Copies of Objects in Lists
How to Do Row-To-Column Transposition of Data in CSV Table
Opencv Python: Cv2.Findcontours - Valueerror: Too Many Values to Unpack
Why Is the Time Complexity of Python's List.Append() Method O(1)