Python Extract Pattern Matches

Python extract pattern matches

You need to capture from regex. search for the pattern, if found, retrieve the string using group(index). Assuming valid checks are performed:

>>> p = re.compile("name (.*) is valid")
>>> result = p.search(s)
>>> result
<_sre.SRE_Match object at 0x10555e738>
>>> result.group(1)     # group(1) will return the 1st capture (stuff within the brackets).
                        # group(0) will returned the entire matched text.
'my_user_name'

Extract part of a regex match

Use ( ) in regexp and group(1) in python to retrieve the captured string (re.search will return None if it doesn't find the result, so don't use group() directly):

title_search = re.search('<title>(.*)</title>', html, re.IGNORECASE)

if title_search:
    title = title_search.group(1)

Python - Extract pattern from string using RegEx

import re
pattern = re.compile(r'foo\(.*?\)')
test_str = 'foo(123456) together with foo(2468)'

for match in re.findall(pattern, test_str):
    print(match)

Two things:

.*? is the lazy quantifier. It behaves the same as the greedy quantifier (.*), except it tries to match the least amount of characters possible going from left-to-right across the string. Note that if you want to match at least one character between the parentheses, you'll want to use .+?.
Use \( and \) instead of ( and ) because parentheses are normally used inside regular expressions to indicate capture groups, so if you want to match parentheses literally, you have to use the escape character before them, which is backslash.

Python - Regex findall extract all patterns that may be substring of one another

In the situation where one keyword is a substring of another, you will need to iterate over your keywords as matching using regex will always pick one or the other (most modules such as re pick the first match in the alternation - see here) at a given point in the string, but never both. You could iterate over the keywords to ensure you find all matches using code like this:

import re
 
string = "A B C D"
keys = ["A", "B", "A B"]
 
matches = []
for k in keys:
    matches += re.findall(re.escape(k), string)
 
print(matches)

Output

['A', 'B', 'A B']

Demo on ideone

How do I return a string from a regex match in python?

You should use re.MatchObject.group(0). Like

imtag = re.match(r'<img.*?>', line).group(0)

Edit:

You also might be better off doing something like

imgtag  = re.match(r'<img.*?>',line)
if imtag:
    print("yo it's a {}".format(imgtag.group(0)))

to eliminate all the Nones.

How to extract the substring between two markers?

Using regular expressions - documentation for further reference

import re

text = 'gfgfdAAA1234ZZZuijjk'

m = re.search('AAA(.+?)ZZZ', text)
if m:
    found = m.group(1)

# found: 1234

or:

import re

text = 'gfgfdAAA1234ZZZuijjk'

try:
    found = re.search('AAA(.+?)ZZZ', text).group(1)
except AttributeError:
    # AAA, ZZZ not found in the original string
    found = '' # apply your error handling

# found: 1234

use regex to extract multiple strings following certain pattern

If you want to return all the matches individually using only a single findall, then you'll need to make use of positive lookbehind, e.g. (?<=foo). Python module re unfortunately only supports fixed-width lookbehind. However, if you're willing to use the outstanding regex module, then it can be done.

Regex:

(?<=Invalid items: \([^)]*)[^ ;)]+

Demonstration: https://regex101.com/r/p90Z81/1

If there can be empty items, a small modification to the regex allows capture of these zero-width matches, as follows:

(?<=Invalid items: \([^)]*)(?:[^ ;)]+|(?<=\(| ))

python regex: extract list elements, each of which matches multiple patterns

Why you used two regex, actually it can finish in one regex

import re

somelist = [ 
     'AAAA  1234   SD OXD',
     'AAAB  2342   DF BDD',
     'ERTE  3454   RE DFD',
     'GWED  1234   SD TCD',
     'AAAA  2353   SD MKX',
     'VERD  1234   IO ERT',
     'AAAA 2353   SD MKX',
     'AAAA  2353  SD MKX']

print(list(filter(lambda x : re.search(r".{6}1234\s{3}SD",x) ,somelist)))
# ['AAAA  1234   SD OXD', 'GWED  1234   SD TCD']

Regex extract word starting with a set string and ending the line or ending with ;

Match SAT and everything not a space, semicolon or newline:

\bSAT[^ ;\n]*

See live demo.

Python Extract Pattern Matches