Python - Using Regex to Find Multiple Matches and Print Them Out

Python - Using regex to find multiple matches and print them out

Do not use regular expressions to parse HTML.

But if you ever need to find all regexp matches in a string, use the findall function.

import re
line = 'bla bla bla<form>Form 1</form> some text...<form>Form 2</form> more text?'
matches = re.findall('<form>(.*?)</form>', line, re.DOTALL)
print(matches)

# Output: ['Form 1', 'Form 2']

Python - Using Regex to find multiple matches and report in a certain order

for match in re.finditer('<form1?>(.*?)</form1?>', line, re.S):
print(match.group(1))

I modify the code:

for match in re.finditer('(<form>(.*?)</form>)|(<form1>(.*?)</form1>)', line, re.S):
if None != match.group(4):
print(match.group(4))
else:
print(match.group(2))

Regex: Help to find multiple values in string (Python)

You can extract values like this (using Avinash's regex)

import re

regex = re.compile(r"(C\d{3})([A-RT-Z\d]+)?(S[\d\-_]+)?(?:_\d+)")
text = "C001F1S15_08"
match = regex.match(text)
print(match.group(1)) # C001
print(match.group(2)) # F1
print(match.group(3)) # S15
print(match.groups()) # ('C001', 'F1', 'S15')
print(list(match.groups()[:3])) # ['C001', 'F1', 'S15']

See here for more information. Keep in mind that .group(0) refers to the entire match, in this case the input string.

How can I find all matches to a regular expression in Python?

Use re.findall or re.finditer instead.

re.findall(pattern, string) returns a list of matching strings.

re.finditer(pattern, string) returns an iterator over MatchObject objects.

Example:

re.findall( r'all (.*?) are', 'all cats are smarter than dogs, all dogs are dumber than cats')
# Output: ['cats', 'dogs']

[x.group() for x in re.finditer( r'all (.*?) are', 'all cats are smarter than dogs, all dogs are dumber than cats')]
# Output: ['all cats are', 'all dogs are']

Regex multiple matches in a search

Use re.finditer and loop over the result; finditer returns each match object one at a time, not just the first hit.

# Move compile outside the loop; the whole point of compiling is to do the work once
# and reuse the compiled object over and over
reg = re.compile('href="(.*?)"|href=\'(.*?)\'')
for num, line in enumerate(file, 1):
if check in line:
print 'href at line', num
for link in reg.finditer(line):
print 'url:', link.group(1)

Python regex multiple matches occurrences between two strings

You could match any character except X or Y in group 1 and then match X and do the same for Y. The "after the magic string" part you could capture in a lookahead with a third group.

The negated character class using [^ will also match an newline to match the FFFFFF part.

([^XY]+)X([^XY]+)Y(?=([^XY]+))
  • ([^XY]+)X Capture group 1, match 1+ times any char except X or Y, then match X
  • ([^XY]+)Y Capture group 2, match 1+ times any char except X or Y, then match Y
  • (?= Positive lookahead, assert what is directly to the right is
    • ([^XY]+) Capture group 3, match 1+ times any char except X or Y
  • ) Close lookahead

Regex demo | Python demo

import re

regex = r"([^XY]+)X([^XY]+)Y(?=([^XY]*))"

s = ("AAAAAXBBBBBYCCCCCXDDDDDYEEEEEEXFFF\n"
"FFFYGGG")

matches = re.findall(regex, s)
print(matches)

Output

[('AAAAA', 'BBBBB', 'CCCCC'), ('CCCCC', 'DDDDD', 'EEEEEE'), ('EEEEEE', 'FFF\nFFF', 'GGG')]

Python regex with multiple matches in the same string

You could change your .* to be .*? so that they are non-greedy. That will make your original example work:

import re

test = '<tag>part1</tag><tag can have random stuff here>part2</tag>'
print(re.findall(r'<tag.*?>(.*?)</tag>', test))

Output:

['part1', 'part2']

Though it would probably be best to not try to parse this with just regex, but instead use a proper HTML parser library.

regexes: How to access multiple matches of a group?

Drop the * from your regex (so it matches exactly one instance of your pattern). Then use either re.findall(...) or re.finditer (see here) to return all matches.

Update:

It sounds like you're essentially building a recursive descent parser. For relatively simple parsing tasks, it is quite common and entirely reasonable to do that by hand. If you're interested in a library solution (in case your parsing task may become more complicated later on, for example), have a look at pyparsing.

Python finding multiple regex matches within a regex match

As Serge mentioned, this is not really a problem you want to tackle with a single regular expression, but with multiple regular expressions and some python magic:

def replacer(match):  # re.sub can take a function as the repl argument which gives you more flexibility
choices = {'<':'{', '>':'}'} # replace < with { and > with }
return choices[match.group(0)]

result = [] # store the results here
for text in re.split(r'(?s)(?=<table)(.*)(?<=table>)', your_text): # split your text into table parts and non table parts
if text.startswith('<table'): # if this is a table part, do the <> replacement
result.append(re.sub(r'[<>]', replacer, text))
else: # otherwise leave it the same
result.append(text)
print(''.join(result)) # join the list of strings to get the final result

check out the documentation for using a function for the repl argument for re.sub here

And an explanation of the regular expressions:

(?s)        # the . matches newlines 
(?=<table) # positive look-ahead matching '<table'
(.*) # matches everything between <table and table> (it is inclusive because of the look-ahead/behinds)
(?<=table>) # positive look-behind matching 'table>'

Also note that because (.*) is in a capture group, it is included in the strings output by re.split (see here)

Find multiple matches using re in Python (beginner question)

as stated in comment by @han solo you need another syntax for re

also do not forget to initialize key in dictionary before you +=

import re
some_words_lst = ['caT.', 'Cat', 'Dog', 'paper', 'caty', 'London', 'loNdon','londonS']

words_to_find = ['cat', 'london']

r = re.compile('|'.join(words_to_find), re.IGNORECASE)

count_dictionary = {"i": 0}

for item in some_words_lst:
if r.match(item):
count_dictionary['i']+=1

print(count_dictionary)

UPD:
according to the comment we need count of matched items. What is about something quick and dirty like this?

import re
some_words_lst = ['caT.', 'Cat', 'Dog', 'paper', 'caty', 'London', 'loNdon','londonS']

words_to_find = ['cat', 'london']

r = re.compile('|'.join(words_to_find), re.IGNORECASE)

count_dictionary = {word: 0 for word in words_to_find}

for item in some_words_lst:
if r.match(item):
my_match = r.match(item)[0]
count_dictionary[my_match.lower()]+=1

print(count_dictionary)


Related Topics



Leave a reply



Submit