How do I return a string from a regex match in python?
You should use re.MatchObject.group(0)
. Like
imtag = re.match(r'<img.*?>', line).group(0)
Edit:
You also might be better off doing something like
imgtag = re.match(r'<img.*?>',line)
if imtag:
print("yo it's a {}".format(imgtag.group(0)))
to eliminate all the None
s.
Python extract pattern matches
You need to capture from regex. search
for the pattern, if found, retrieve the string using group(index)
. Assuming valid checks are performed:
>>> p = re.compile("name (.*) is valid")
>>> result = p.search(s)
>>> result
<_sre.SRE_Match object at 0x10555e738>
>>> result.group(1) # group(1) will return the 1st capture (stuff within the brackets).
# group(0) will returned the entire matched text.
'my_user_name'
Return string with first match for a regex, handling case where there is no match
You could embed the ''
default in your regex by adding |$
:
>>> re.findall('\d+|$', 'aa33bbb44')[0]
'33'
>>> re.findall('\d+|$', 'aazzzbbb')[0]
''
>>> re.findall('\d+|$', '')[0]
''
Also works with re.search
pointed out by others:
>>> re.search('\d+|$', 'aa33bbb44').group()
'33'
>>> re.search('\d+|$', 'aazzzbbb').group()
''
>>> re.search('\d+|$', '').group()
''
Regular Expression that return matches specific strings in bracket and return its next and preceding string in brackets
You can use
\([^()]*?matchingString[^)]*\)
See the regex demo. Due to the [^()]*?
, the match will never overflow across other (...)
substrings.
Regex details:
\(
- a(
char[^()]*?
- zero or more chars other than(
and)
as few as possiblematchingString
- a hardcoded string[^)]*
- zero or more chars other than)
\)
- a)
char.
See the Python demo:
import re
text = 'This is an sample string which have some information in brackets (info; matchingString, someotherString).'
regex= r"\([^()]*?matchingString[^)]*\)"
print( re.findall(regex, text) )
# => ['(info; matchingString, someotherString)']
How to match a regex pattern but return None if additional characters are in string?
Place a ^
for the beginning of a line and a $
for the end like
^[0-9]+-[0-9]+$
EDIT: As @Wiktor noticed, the full match of a complete line does not work on a complete string if it contains line breaks. So you would have to test for line breaks regardless of the programming language or use re.fullmatch
.
Extract an alphanumeric string between two special characters
You seem to need only the part of strings between =
and >
. In this case, it is much easier to use a capturing group around the alphanumeric pattern and use it with re.findall
that will never return None
, but just an empty list upon no match, or a list of captured texts if found. Also, I doubt you need empty matches, so use +
instead of *
:
pattern=re.compile(r"=([A-Z0-9]+)>")
^ ^
and then
"\n".join(pattern.findall(line))
How can I match anything up until this sequence of characters in a regular expression?
You didn't specify which flavor of regex you're using, but this will
work in any of the most popular ones that can be considered "complete".
/.+?(?=abc)/
How it works
The .+?
part is the un-greedy version of .+
(one or more of
anything). When we use .+
, the engine will basically match everything.
Then, if there is something else in the regex it will go back in steps
trying to match the following part. This is the greedy behavior,
meaning as much as possible to satisfy.
When using .+?
, instead of matching all at once and going back for
other conditions (if any), the engine will match the next characters by
step until the subsequent part of the regex is matched (again if any).
This is the un-greedy, meaning match the fewest possible to
satisfy.
/.+X/ ~ "abcXabcXabcX" /.+/ ~ "abcXabcXabcX"
^^^^^^^^^^^^ ^^^^^^^^^^^^
/.+?X/ ~ "abcXabcXabcX" /.+?/ ~ "abcXabcXabcX"
^^^^ ^
Following that we have (?=
{contents}
)
, a zero width
assertion, a look around. This grouped construction matches its
contents, but does not count as characters matched (zero width). It
only returns if it is a match or not (assertion).
Thus, in other terms the regex /.+?(?=abc)/
means:
Match any characters as few as possible until a "abc" is found,
without counting the "abc".
How to return full substring from partial substring match in python as a list?
You can use
pattern=re.compile(r"\w*?(?:tion|ex|ph|ost|ast|ist)\w*")
pattern=re.compile(r"[a-zA-Z]*?(?:tion|ex|ph|ost|ast|ist)[a-zA-Z]*")
pattern=re.compile(r"[^\W\d_]*?(?:tion|ex|ph|ost|ast|ist)[^\W\d_]*")
The regex (see the regex demo) matches
\w*?
- zero or more but as few as possible word chars(?:tion|ex|ph|ost|ast|ist)
- one of the strings\w*
- zero or more but as many as possible word chars
The [a-zA-Z]
part will match only ASCII letters, and [^\W\d_]
will match any Unicode letters.
Mind the use of the non-capturing group with re.findall
, as otherwise, the captured substrings will also get their way into the output list.
If you need to only match letter words, and you need to match them as whole words, add word boundaries, r"\b[a-zA-Z]*?(?:tion|ex|ph|ost|ast|ist)[a-zA-Z]*\b"
.
See the Python demo:
import re
def latin_ish_words(text):
import re
pattern=re.compile(r"\w*?(?:tion|ex|ph|ost|ast|ist)\w*")
return pattern.findall(text)
print(latin_ish_words("This functions as expected"))
# => ['functions', 'expected']
Checking whole string with a regex
\d+
matches any positive number of digits within your string, so it matches the first 78
and succeeds.
Use ^\d+$
.
Or, even better: "78.46.92.168:8000".isdigit()
Related Topics
Python Flask Intentional Empty Response
Error Installing Psycopg2 on MACos 10.9.5
Multiprocessing - Pipe VS Queue
Test Case Execution Order in Pytest
Python 2.7:Write to File Instantly
Python Create Unix Timestamp Five Minutes in the Future
Read a File Line by Line from S3 Using Boto
Class Variables Is Shared Across All Instances in Python
Python Replace String Pattern with Output of Function
How to Set a Default Value for a Wtforms Selectfield
Ambiguity in Pandas Dataframe/Numpy Array "Axis" Definition
Dynamically Adding @Property in Python
Destroywindow Does Not Close Window on MAC Using Python and Opencv
Removing List of Words from a String
How Is Tuple Implemented in Cpython
How to Check If an Object Is a List or Tuple (But Not String)