Match text between two strings with regular expression
Use re.search
>>> import re
>>> s = 'Part 1. Part 2. Part 3 then more text'
>>> re.search(r'Part 1\.(.*?)Part 3', s).group(1)
' Part 2. '
>>> re.search(r'Part 1(.*?)Part 3', s).group(1)
'. Part 2. '
Or use re.findall
, if there are more than one occurances.
Python/Regex: Get all strings between any two characters
Let me know if this is what you are looking for:
import re
def smallest_between_two(a, b, text):
return min(re.findall(re.escape(a)+"(.*?)"+re.escape(b),text), key=len)
print(smallest_between_two(' ', '(', 'def test()'))
print(smallest_between_two('[', ']', '[this one][not this one]'))
print(smallest_between_two('paste ', '/', '@paste "game_01/01"'))
Output:
test
this one
"game_01
To add an explanation to what this does:
re.findall()
:
Return all non-overlapping matches of pattern in string, as a list of strings
re.escape()
Escape all the characters in pattern except ASCII letters and numbers. This is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it
(.*?)
.*?
matches any character (except for line terminators)
*?
Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
So our regular expression matches any character (not including line terminators) between two arbitrary escaped strings, and then returns the shortest length string from the list
that re.findall()
returns.
Extract substring between two different characters using a python regular expression
Use expression:
(?<=>)[^<:]+(?=:?<)
(?<=>)
Positive lookbehind for>
.[^<:]+
Match anything other than<
or:
.(?=:?<)
Positive lookahead for optional colon:
, and<
.
You can try the expression live here.
In Python:
import re
first_string = '<h4 id="Foobar:">Foobar:</h4>'
second_string = '<h1 id="Monty">Python<a href="https://..."></a></h1>'
print(re.findall(r'(?<=>)[^<:]+(?=:?<)',first_string)[0])
print(re.findall(r'(?<=>)[^<:]+(?=:?<)',second_string)[0])
Prints:
Foobar
Python
Alternatively you could use expression:
(?<=>)[a-zA-Z]+(?=\W*<)
(?<=>)
Positive lookbehind for>
.[a-zA-Z]+
Lower and upper case letters.(?=\W*<)
Positive lookahead for any non word characters followed by<
.
You can test this expression here.
print(re.findall(r'(?<=>)[a-zA-Z]+(?=\W*<)',first_string)[0])
print(re.findall(r'(?<=>)[a-zA-Z]+(?=\W*<)',second_string)[0])
Prints:
Foobar
Python
Python 3 How to get string between two points using regex?
Use ABC
and XYZ
as anchors with look-behind and look-ahead assertions:
(?<=ABC).*?(?=XYZ)
The (?<=...)
look-behind assertion only matches at the location in the text that was preceded by ABC
. Similarly, (?=XYZ)
matches at the location that is followed by XYZ
. Together they form two anchors that limit the .*
expression, which matches anything.
You can find all such anchored pieces of text with re.findall()
:
for matchedtext in re.findall(r'(?<=ABC).*?(?=XYZ)', inputtext):
If ABC
and XYZ
are variable, you want to use re.escape()
(to prevent any of their content from being interpreted as regular expression syntax) on them and interpolate:
re.match(r'(?<={}).*?(?={})'.format(abc, xyz), inputtext)
Python Regex to find String between two strings
If you want .
to match newlines, you have the use the re.S
option.
Also, it would seem a better idea to check if the regex matched before proceeding with further calls. Your call to lower()
gave me an error because the regex didn't match, so calling result.group(0).lower()
only when result
evaluates as true is safer.
import re
toFind = ['[]', '[x]']
text = "| Completed?|\n|------|:---------:|\n|Link Created | [] |\n|Research Done | [X] "
# Regex to search between parameters and make result lowercase if there are any uppercase Chars
result = (re.search("(?<=Link Created)(.+?)(?=Research Done)", text, re.S))
if result:
# Gets rid of whitespace in case they move the []/[x] around
result = result.group(0).lower().replace(" ", "")
if any(x in result for x in toFind):
print("Exists")
else:
print("Doesn't Exist")
else:
print("re did not match")
PS: all the re
options are documented in the re module documentation. Search for re.DOTALL
for the details on re.S
(they're synonyms). If you want to combine options, use bitwise OR. E.g., re.S|re.I
will have .
match newline and do case-insensitive matching.
Python: Find a string between two strings, repeatedly
Use re.findall()
:
result = re.findall(r'var="(.*?)"', test)
print(result) # ['this', 'that']
If the test
string contains multiple lines, use the re.DOTALL
flag.
re.findall(r'var="(.*?)"', test, re.DOTALL)
Regex Python extract digits between two strings in a single expression
You can try like this
import re
text = "Advance [Extra Value of $1,730,555] in packages 2,3, and 5."
match = re.findall(r'\$(.*)]',text)[0].replace(',','')
print match
Regular expression to return all characters between two special characters
^.*\['(.*)'\].*$
will match a line and capture what you want in a group.
You have to escape the [
and ]
with \
The documentation at the rubular.com proof link will explain how the expression is formed.
Related Topics
How to Select All Elements Greater Than a Given Values in a Dataframe
Delete Rows Containing Numeric Values in Strings from Pandas Dataframe
Print All Number Divisible by 7 and Contain 7 from 0 to 100
How to Convert Python Code to Application
Pandas Dataframe Calculations With Previous Row
How to Extract a Value (I Want an Int Not Row) from a Dataframe and Do Simple Calculations on It
How to Select the Last Column of Dataframe
Get Only Unique Words from a Sentence in Python
How to Read Numbers from File in Python
Json.Decoder.Jsondecodeerror: Expecting Value: Line 1 Column 1 (Char 0) Python
Pyspark - Sum a Column in Dataframe and Return Results as Int
Convert Regular Python String to Raw String
Python Pandas .Isnull() Does Not Work on Nat in Object Dtype