Using Regex to Get the Value Between Two Characters (Python 3)

Match text between two strings with regular expression

Use re.search

>>> import re
>>> s = 'Part 1. Part 2. Part 3 then more text'
>>> re.search(r'Part 1\.(.*?)Part 3', s).group(1)
' Part 2. '
>>> re.search(r'Part 1(.*?)Part 3', s).group(1)
'. Part 2. '

Or use re.findall, if there are more than one occurances.

Python/Regex: Get all strings between any two characters

Let me know if this is what you are looking for:

import re

def smallest_between_two(a, b, text):
return min(re.findall(re.escape(a)+"(.*?)"+re.escape(b),text), key=len)

print(smallest_between_two(' ', '(', 'def test()'))
print(smallest_between_two('[', ']', '[this one][not this one]'))
print(smallest_between_two('paste ', '/', '@paste "game_01/01"'))

Output:

test
this one
"game_01

To add an explanation to what this does:

re.findall():

Return all non-overlapping matches of pattern in string, as a list of strings

re.escape()

Escape all the characters in pattern except ASCII letters and numbers. This is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it

(.*?)

.*? matches any character (except for line terminators)

*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)

So our regular expression matches any character (not including line terminators) between two arbitrary escaped strings, and then returns the shortest length string from the list that re.findall() returns.

Extract substring between two different characters using a python regular expression

Use expression:

(?<=>)[^<:]+(?=:?<)
  • (?<=>) Positive lookbehind for >.
  • [^<:]+ Match anything other than < or :.
  • (?=:?<) Positive lookahead for optional colon :, and <.

You can try the expression live here.

In Python:

import re
first_string = '<h4 id="Foobar:">Foobar:</h4>'
second_string = '<h1 id="Monty">Python<a href="https://..."></a></h1>'

print(re.findall(r'(?<=>)[^<:]+(?=:?<)',first_string)[0])
print(re.findall(r'(?<=>)[^<:]+(?=:?<)',second_string)[0])

Prints:

Foobar
Python

Alternatively you could use expression:

(?<=>)[a-zA-Z]+(?=\W*<)
  • (?<=>) Positive lookbehind for >.
  • [a-zA-Z]+ Lower and upper case letters.
  • (?=\W*<) Positive lookahead for any non word characters followed by <.

You can test this expression here.

print(re.findall(r'(?<=>)[a-zA-Z]+(?=\W*<)',first_string)[0])
print(re.findall(r'(?<=>)[a-zA-Z]+(?=\W*<)',second_string)[0])

Prints:

Foobar
Python

Python 3 How to get string between two points using regex?

Use ABC and XYZ as anchors with look-behind and look-ahead assertions:

(?<=ABC).*?(?=XYZ)

The (?<=...) look-behind assertion only matches at the location in the text that was preceded by ABC. Similarly, (?=XYZ) matches at the location that is followed by XYZ. Together they form two anchors that limit the .* expression, which matches anything.

You can find all such anchored pieces of text with re.findall():

for matchedtext in re.findall(r'(?<=ABC).*?(?=XYZ)', inputtext):

If ABC and XYZ are variable, you want to use re.escape() (to prevent any of their content from being interpreted as regular expression syntax) on them and interpolate:

re.match(r'(?<={}).*?(?={})'.format(abc, xyz), inputtext)

Python Regex to find String between two strings

If you want . to match newlines, you have the use the re.S option.

Also, it would seem a better idea to check if the regex matched before proceeding with further calls. Your call to lower() gave me an error because the regex didn't match, so calling result.group(0).lower() only when result evaluates as true is safer.

import re

toFind = ['[]', '[x]']
text = "| Completed?|\n|------|:---------:|\n|Link Created | [] |\n|Research Done | [X] "

# Regex to search between parameters and make result lowercase if there are any uppercase Chars
result = (re.search("(?<=Link Created)(.+?)(?=Research Done)", text, re.S))

if result:
# Gets rid of whitespace in case they move the []/[x] around
result = result.group(0).lower().replace(" ", "")

if any(x in result for x in toFind):
print("Exists")
else:
print("Doesn't Exist")
else:
print("re did not match")

PS: all the re options are documented in the re module documentation. Search for re.DOTALL for the details on re.S (they're synonyms). If you want to combine options, use bitwise OR. E.g., re.S|re.I will have . match newline and do case-insensitive matching.

Python: Find a string between two strings, repeatedly

Use re.findall():

result = re.findall(r'var="(.*?)"', test)
print(result) # ['this', 'that']

If the test string contains multiple lines, use the re.DOTALL flag.

re.findall(r'var="(.*?)"', test, re.DOTALL)

Regex Python extract digits between two strings in a single expression

You can try like this

import re
text = "Advance [Extra Value of $1,730,555] in packages 2,3, and 5."
match = re.findall(r'\$(.*)]',text)[0].replace(',','')
print match

Regular expression to return all characters between two special characters

^.*\['(.*)'\].*$ will match a line and capture what you want in a group.

You have to escape the [ and ] with \

The documentation at the rubular.com proof link will explain how the expression is formed.



Related Topics



Leave a reply



Submit