Find Out How Many Times a Regex Matches in a String in Python

Find out how many times a regex matches in a string in Python

The existing solutions based on findall are fine for non-overlapping matches (and no doubt optimal except maybe for HUGE number of matches), although alternatives such as sum(1 for m in re.finditer(thepattern, thestring)) (to avoid ever materializing the list when all you care about is the count) are also quite possible. Somewhat idiosyncratic would be using subn and ignoring the resulting string...:

def countnonoverlappingrematches(pattern, thestring):
return re.subn(pattern, '', thestring)[1]

the only real advantage of this latter idea would come if you only cared to count (say) up to 100 matches; then, re.subn(pattern, '', thestring, 100)[1] might be practical (returning 100 whether there are 100 matches, or 1000, or even larger numbers).

Counting overlapping matches requires you to write more code, because the built-in functions in question are all focused on NON-overlapping matches. There's also a problem of definition, e.g, with pattern being 'a+' and thestring being 'aa', would you consider this to be just one match, or three (the first a, the second one, both of them), or...?

Assuming for example that you want possibly-overlapping matches starting at distinct spots in the string (which then would give TWO matches for the example in the previous paragraph):

def countoverlappingdistinct(pattern, thestring):
total = 0
start = 0
there = re.compile(pattern)
while True:
mo = there.search(thestring, start)
if mo is None: return total
total += 1
start = 1 + mo.start()

Note that you do have to compile the pattern into a RE object in this case: function re.search does not accept a start argument (starting position for the search) the way method search does, so you'd have to be slicing thestring as you go -- definitely more effort than just having the next search start at the next possible distinct starting point, which is what I'm doing in this function.

Number of regex matches

If you know you will want all the matches, you could use the re.findall function. It will return a list of all the matches. Then you can just do len(result) for the number of matches.

Counting occurrences of Regex Matches in Python

Below is a modified variant of my answer at https://stackoverflow.com/a/64220148/6632736.

It is assumed that the log is in a file, which is read line by line.

#!/usr/bin/python
import os
import re

def increment(ips: dict, line: str):
match = re.match(r'^.+?\s+-\s+(?P<ip>\d{1,3}(\.\d{1,3}){3})\s.*$', line)
if match:
ip = match.group('ip')
if not ip in ips:
ips[ip] = 0
ips[ip] += 1

def parse_log_file(log: str) -> dict:
ips = dict()
with open(log, 'r') as file:
for line in file:
increment(ips, line)
return ips

# log is the path to the log file:
for key, value in parse_log_file(log).items():
print(key, ":", value)

How can I find all matches to a regular expression in Python?

Use re.findall or re.finditer instead.

re.findall(pattern, string) returns a list of matching strings.

re.finditer(pattern, string) returns an iterator over MatchObject objects.

Example:

re.findall( r'all (.*?) are', 'all cats are smarter than dogs, all dogs are dumber than cats')
# Output: ['cats', 'dogs']

[x.group() for x in re.finditer( r'all (.*?) are', 'all cats are smarter than dogs, all dogs are dumber than cats')]
# Output: ['all cats are', 'all dogs are']

Count number of occurences of each string using regex

You may pass a callable as the replacement argument to re.sub and collect the necessary counting details during a single replacement pass:

import re

counter = {}

def repl(m):
if m.group() in counter:
counter[m.group()] += 1
else:
counter[m.group()] = 1
return 'd'

text = "a;b o a;c a l l e d;a;c a b"
rx = re.compile(r'\b(a|b|c)\b')
result = rx.sub(repl, text)
print(counter, result, sep="\n")

See the Python demo online, output;

{'a': 5, 'b': 2, 'c': 2}
d;d o d;d d l l e d;d;d d d

Check if string matches pattern

import re
pattern = re.compile("^([A-Z][0-9]+)+$")
pattern.match(string)

How to find out the number of times group was matched in Python regular expressions?

One way you could do this is to enclose each repeating capture group inside another group, then you can divide the length of the outer match by the length of the inner match to determine how many times each inner group matched. For example:

import re

m = re.search(r'0((10)+)((20)+)', '0001010202020000')
num_grps = len(m.groups())
for i in range(1, num_grps+1,2):
outer = m.end(i) - m.start(i)
inner = m.end(i+1) - m.start(i+1)
print((m.group(i+1), outer//inner))

Output:

('10', 2)
('20', 3)

Python - Using regex to find multiple matches and print them out

Do not use regular expressions to parse HTML.

But if you ever need to find all regexp matches in a string, use the findall function.

import re
line = 'bla bla bla<form>Form 1</form> some text...<form>Form 2</form> more text?'
matches = re.findall('<form>(.*?)</form>', line, re.DOTALL)
print(matches)

# Output: ['Form 1', 'Form 2']


Related Topics



Leave a reply



Submit