Python extract pattern matches
You need to capture from regex. search
for the pattern, if found, retrieve the string using group(index)
. Assuming valid checks are performed:
>>> p = re.compile("name (.*) is valid")
>>> result = p.search(s)
>>> result
<_sre.SRE_Match object at 0x10555e738>
>>> result.group(1) # group(1) will return the 1st capture (stuff within the brackets).
# group(0) will returned the entire matched text.
'my_user_name'
Extract part of a regex match
Use (
)
in regexp and group(1)
in python to retrieve the captured string (re.search
will return None
if it doesn't find the result, so don't use group()
directly):
title_search = re.search('<title>(.*)</title>', html, re.IGNORECASE)
if title_search:
title = title_search.group(1)
Python - Extract pattern from string using RegEx
import re
pattern = re.compile(r'foo\(.*?\)')
test_str = 'foo(123456) together with foo(2468)'
for match in re.findall(pattern, test_str):
print(match)
Two things:
.*?
is the lazy quantifier. It behaves the same as the greedy quantifier (.*
), except it tries to match the least amount of characters possible going from left-to-right across the string. Note that if you want to match at least one character between the parentheses, you'll want to use.+?
.Use
\(
and\)
instead of(
and)
because parentheses are normally used inside regular expressions to indicate capture groups, so if you want to match parentheses literally, you have to use the escape character before them, which is backslash.
Python - Regex findall extract all patterns that may be substring of one another
In the situation where one keyword is a substring of another, you will need to iterate over your keywords as matching using regex will always pick one or the other (most modules such as re
pick the first match in the alternation - see here) at a given point in the string, but never both. You could iterate over the keywords to ensure you find all matches using code like this:
import re
string = "A B C D"
keys = ["A", "B", "A B"]
matches = []
for k in keys:
matches += re.findall(re.escape(k), string)
print(matches)
Output
['A', 'B', 'A B']
Demo on ideone
How do I return a string from a regex match in python?
You should use re.MatchObject.group(0)
. Like
imtag = re.match(r'<img.*?>', line).group(0)
Edit:
You also might be better off doing something like
imgtag = re.match(r'<img.*?>',line)
if imtag:
print("yo it's a {}".format(imgtag.group(0)))
to eliminate all the None
s.
How to extract the substring between two markers?
Using regular expressions - documentation for further reference
import re
text = 'gfgfdAAA1234ZZZuijjk'
m = re.search('AAA(.+?)ZZZ', text)
if m:
found = m.group(1)
# found: 1234
or:
import re
text = 'gfgfdAAA1234ZZZuijjk'
try:
found = re.search('AAA(.+?)ZZZ', text).group(1)
except AttributeError:
# AAA, ZZZ not found in the original string
found = '' # apply your error handling
# found: 1234
use regex to extract multiple strings following certain pattern
If you want to return all the matches individually using only a single findall
, then you'll need to make use of positive lookbehind, e.g. (?<=foo)
. Python module re
unfortunately only supports fixed-width lookbehind. However, if you're willing to use the outstanding regex module, then it can be done.
Regex:
(?<=Invalid items: \([^)]*)[^ ;)]+
Demonstration: https://regex101.com/r/p90Z81/1
If there can be empty items, a small modification to the regex allows capture of these zero-width matches, as follows:
(?<=Invalid items: \([^)]*)(?:[^ ;)]+|(?<=\(| ))
python regex: extract list elements, each of which matches multiple patterns
Why you used two regex, actually it can finish in one regex
import re
somelist = [
'AAAA 1234 SD OXD',
'AAAB 2342 DF BDD',
'ERTE 3454 RE DFD',
'GWED 1234 SD TCD',
'AAAA 2353 SD MKX',
'VERD 1234 IO ERT',
'AAAA 2353 SD MKX',
'AAAA 2353 SD MKX']
print(list(filter(lambda x : re.search(r".{6}1234\s{3}SD",x) ,somelist)))
# ['AAAA 1234 SD OXD', 'GWED 1234 SD TCD']
Regex extract word starting with a set string and ending the line or ending with ;
Match SAT
and everything not a space, semicolon or newline:
\bSAT[^ ;\n]*
See live demo.
Related Topics
Problem Running Python from Crontab - "Invalid Python Installation"
Problems Adding Path and Calling External Program from Python
Python Multiprocessing + Subprocess Issues
Set Bash Variable from Python Script
Unicodedecodeerror Reading Binary Input
What Is the Correct Way to Include Localisation in Python Packages
Python Script Not Working via Cron
Why Does Loading the Libc Shared Library Have "'Libraryloader' Object Is Not Callable" Error
Using Python Subprocess.Call() to Launch an Ncurses Process
How to Direct Output to a File When There Are Utf-8 Characters
How to Open Process Again in Linux Terminal
How Remove Camera Preview to Raspberry Pi
Check If One Package Is Installed in My System with Python