Python - Using regex to find multiple matches and print them out
Do not use regular expressions to parse HTML.
But if you ever need to find all regexp matches in a string, use the findall
function.
import re
line = 'bla bla bla<form>Form 1</form> some text...<form>Form 2</form> more text?'
matches = re.findall('<form>(.*?)</form>', line, re.DOTALL)
print(matches)
# Output: ['Form 1', 'Form 2']
Python - Using Regex to find multiple matches and report in a certain order
for match in re.finditer('<form1?>(.*?)</form1?>', line, re.S):
print(match.group(1))
I modify the code:
for match in re.finditer('(<form>(.*?)</form>)|(<form1>(.*?)</form1>)', line, re.S):
if None != match.group(4):
print(match.group(4))
else:
print(match.group(2))
Regex: Help to find multiple values in string (Python)
You can extract values like this (using Avinash's regex)
import re
regex = re.compile(r"(C\d{3})([A-RT-Z\d]+)?(S[\d\-_]+)?(?:_\d+)")
text = "C001F1S15_08"
match = regex.match(text)
print(match.group(1)) # C001
print(match.group(2)) # F1
print(match.group(3)) # S15
print(match.groups()) # ('C001', 'F1', 'S15')
print(list(match.groups()[:3])) # ['C001', 'F1', 'S15']
See here for more information. Keep in mind that .group(0)
refers to the entire match, in this case the input string.
How can I find all matches to a regular expression in Python?
Use re.findall
or re.finditer
instead.
re.findall(pattern, string)
returns a list of matching strings.
re.finditer(pattern, string)
returns an iterator over MatchObject
objects.
Example:
re.findall( r'all (.*?) are', 'all cats are smarter than dogs, all dogs are dumber than cats')
# Output: ['cats', 'dogs']
[x.group() for x in re.finditer( r'all (.*?) are', 'all cats are smarter than dogs, all dogs are dumber than cats')]
# Output: ['all cats are', 'all dogs are']
Regex multiple matches in a search
Use re.finditer
and loop over the result; finditer
returns each match object one at a time, not just the first hit.
# Move compile outside the loop; the whole point of compiling is to do the work once
# and reuse the compiled object over and over
reg = re.compile('href="(.*?)"|href=\'(.*?)\'')
for num, line in enumerate(file, 1):
if check in line:
print 'href at line', num
for link in reg.finditer(line):
print 'url:', link.group(1)
Python regex multiple matches occurrences between two strings
You could match any character except X or Y in group 1 and then match X and do the same for Y. The "after the magic string" part you could capture in a lookahead with a third group.
The negated character class using [^
will also match an newline to match the FFFFFF
part.
([^XY]+)X([^XY]+)Y(?=([^XY]+))
([^XY]+)X
Capture group 1, match 1+ times any char except X or Y, then match X([^XY]+)Y
Capture group 2, match 1+ times any char except X or Y, then match Y(?=
Positive lookahead, assert what is directly to the right is([^XY]+)
Capture group 3, match 1+ times any char except X or Y
)
Close lookahead
Regex demo | Python demo
import re
regex = r"([^XY]+)X([^XY]+)Y(?=([^XY]*))"
s = ("AAAAAXBBBBBYCCCCCXDDDDDYEEEEEEXFFF\n"
"FFFYGGG")
matches = re.findall(regex, s)
print(matches)
Output
[('AAAAA', 'BBBBB', 'CCCCC'), ('CCCCC', 'DDDDD', 'EEEEEE'), ('EEEEEE', 'FFF\nFFF', 'GGG')]
Python regex with multiple matches in the same string
You could change your .*
to be .*?
so that they are non-greedy. That will make your original example work:
import re
test = '<tag>part1</tag><tag can have random stuff here>part2</tag>'
print(re.findall(r'<tag.*?>(.*?)</tag>', test))
Output:
['part1', 'part2']
Though it would probably be best to not try to parse this with just regex, but instead use a proper HTML parser library.
regexes: How to access multiple matches of a group?
Drop the *
from your regex (so it matches exactly one instance of your pattern). Then use either re.findall(...)
or re.finditer
(see here) to return all matches.
Update:
It sounds like you're essentially building a recursive descent parser. For relatively simple parsing tasks, it is quite common and entirely reasonable to do that by hand. If you're interested in a library solution (in case your parsing task may become more complicated later on, for example), have a look at pyparsing.
Python finding multiple regex matches within a regex match
As Serge mentioned, this is not really a problem you want to tackle with a single regular expression, but with multiple regular expressions and some python magic:
def replacer(match): # re.sub can take a function as the repl argument which gives you more flexibility
choices = {'<':'{', '>':'}'} # replace < with { and > with }
return choices[match.group(0)]
result = [] # store the results here
for text in re.split(r'(?s)(?=<table)(.*)(?<=table>)', your_text): # split your text into table parts and non table parts
if text.startswith('<table'): # if this is a table part, do the <> replacement
result.append(re.sub(r'[<>]', replacer, text))
else: # otherwise leave it the same
result.append(text)
print(''.join(result)) # join the list of strings to get the final result
check out the documentation for using a function for the repl
argument for re.sub
here
And an explanation of the regular expressions:
(?s) # the . matches newlines
(?=<table) # positive look-ahead matching '<table'
(.*) # matches everything between <table and table> (it is inclusive because of the look-ahead/behinds)
(?<=table>) # positive look-behind matching 'table>'
Also note that because (.*)
is in a capture group, it is included in the strings output by re.split
(see here)
Find multiple matches using re in Python (beginner question)
as stated in comment by @han solo you need another syntax for re
also do not forget to initialize key in dictionary before you +=
import re
some_words_lst = ['caT.', 'Cat', 'Dog', 'paper', 'caty', 'London', 'loNdon','londonS']
words_to_find = ['cat', 'london']
r = re.compile('|'.join(words_to_find), re.IGNORECASE)
count_dictionary = {"i": 0}
for item in some_words_lst:
if r.match(item):
count_dictionary['i']+=1
print(count_dictionary)
UPD:
according to the comment we need count of matched items. What is about something quick and dirty like this?
import re
some_words_lst = ['caT.', 'Cat', 'Dog', 'paper', 'caty', 'London', 'loNdon','londonS']
words_to_find = ['cat', 'london']
r = re.compile('|'.join(words_to_find), re.IGNORECASE)
count_dictionary = {word: 0 for word in words_to_find}
for item in some_words_lst:
if r.match(item):
my_match = r.match(item)[0]
count_dictionary[my_match.lower()]+=1
print(count_dictionary)
Related Topics
Regex to Match Digits and At Most One Space Between Them
Loop Through Json Data in Python
Cast String to Float Is Not Supported in Linear Model
Python Read File Determined by Separator \R\N
How to Use Ffmpeg in a Python Function
Webscraping Financial Data from Morningstar
How to Clean \Xc2\Xa0 \Xc2\Xa0..... in Text Data
Cannot Find Reference 'Xxx' in _Init_.Py
Setting Matplotlib Colorbar Range
How to Properly Setup Pipenv in Pycharm
Get Current Url from Browser Using Python
Python - Using Regex to Find Multiple Matches and Print Them Out
Ioerror: [Errno 32] Broken Pipe When Piping: 'Prog.Py | Othercmd'