Python Regex to find a string in double quotes within a string
Here's all you need to do:
def doit(text):
import re
matches = re.findall(r'"(.+?)"',text)
# matches is now ['String 1', 'String 2', 'String3']
return ",".join(matches)
doit('Regex should return "String 1" or "String 2" or "String3" ')
result:'String 1,String 2,String3'
As pointed out by Li-aung Yip:To elaborate,In addition, if you want to accept empty strings, change.+?
is the "non-greedy" version of.+
. It makes the regular expression match the smallest number of characters it can instead of the most characters it can. The greedy version,.+
, will giveString 1" or "String 2" or "String 3
; the non-greedy version.+?
givesString 1
,String 2
,String 3
.
.+
to .*
. Star *
means zero or more while plus +
means at least one. Extract a string between double quotes
Provided there are no nested quotes:
re.findall(r'"([^"]*)"', inputString)
Demo:>>> import re
>>> inputString = 'According to some, dreams express "profound aspects of personality" (Foulkes 184), though others disagree.'
>>> re.findall(r'"([^"]*)"', inputString)
['profound aspects of personality']
RegEx: Grabbing values between quotation marks
I've been using the following with great success:
(["'])(?:(?=(\\?))\2.)*?\1
It supports nested quotes as well.For those who want a deeper explanation of how this works, here's an explanation from user ephemient:
([""'])
match a quote;((?=(\\?))\2.)
if backslash exists, gobble it, and whether or not that happens, match a character;*?
match many times (non-greedily, as to not eat the closing quote);\1
match the same quote that was use for opening.
Extract text between quotation using regex python
If you want to extract some substring out of a string, you can go for re.search
.
Demo:
import re
str_list = ['"abc"', '"ABC. XYZ"', '"1 - 2 - 3"']
for str in str_list:
search_str = re.search('"(.+?)"', str)
if search_str:
print(search_str.group(1))
Output:abc
ABC. XYZ
1 - 2 - 3
How to match double quote in python regex?
- Those double-quotes in your regular expression are delimiting the string rather than part of the regular expression. If you want them to be part of the actual expression, you'll need to add more, and escape them with a backslash (
r"\"\[.+\]\""
). Alternatively, enclose the string in single quotes instead (r'"\[.+\]"'
). re.match()
only produces a match if the expression is found at the beginning of the string. Since, in your example, there is a double quote character at the beginning of the string, and the regular expression doesn't include a double quote, it does not produce a match. Tryre.search()
orre.findall()
instead.
Use python 3 regex to match a string in double quotes
In your regex
([\"'])[^\1]*\1
Character class is meant for matching only one character. So your use of [^\1]
is incorrect. Think, what would have have happened if there were more than one characters in the first capturing group.You can use negative lookahead like this
(["'])((?!\1).)*\1
or simply with alternation(["'])(?:[^"'\\]+|\\.)*\1
or(?<!\\)(["'])(?:[^"'\\]+|\\.)*\1
if you want to make sure "b\"ccc"
does not matches in string bb\"b\"ccc"
find all substring wrapped in double quotes satisfying serveral constraints in python regular expression
Use a [^"]*
negated character class after the first "
to stay within double quoted substring (note - this will only work if there are no escape sequences in the string and get to the last http
, then add it at the end, too, to get to the trailing "
.
import re
pat = r'"[^"]*(http.*?\.(?:jpg|bmp))[^"]*"'
reg = re.compile(pat)
aa = '"http:afd/aa.bmp" :tt: "kkkk" ++, "http--test--http:kk/bb.jpg"'
print reg.findall(aa)
# => ['http:afd/aa.bmp', 'http:kk/bb.jpg']
See the Python demo online.Pattern details:
"
- a literal double quote[^"]*
- 0+ chars other than a double quote, as many as possible, since*
is a greedy quantifier(http.*?\.(?:jpg|bmp))
- Group 1 (extracted withre.findall
) that matches:http
- a literal substringhttp
.*?
- any 0+ chars, as few as possible (as*?
is a lazy quantifier)\.
- a literal dot(?:jpg|bmp)
- a non-capturing group (so that the text it matches could not be output withre.findall
) matching eitherjpg
orbmp
substring
[^"]*
- 0+ chars other than a double quote, as many as possible"
- a literal double quote
Double quotes on a regex patter string causing failure of regular expression search
Although it is possible to fix the "rf'abc'"
strings using eval()
, this option has some serious security issues and should not be used.
A better solution is to fix these string at their source. The function pattern_gen()
is returning these wrapped strings, and can be modified to return the strings directly:
def pattern_gen(x, y, z):
return rf'^(?:{y})*(?:{z})(?:{y})*$'
rx.rxpattern = pd.DataFrame(rx.apply(lambda x: pattern_gen(x['kw'], x['mand_kw'], x['kw']), axis=1))
Regex match all words except those between quotes
You can match strings between double quotes and then match and capture words optionally followed with dot separated words:
list(filter(None, re.findall(r'"[^"]*"|([a-z_]\w*(?:\.[a-z_]\w*)*)', text, re.ASCII | re.I)))
See the regex demo. Details:"[^"]*"
- a"
char, zero or more chars other than"
and then a"
char|
- or([a-z_]\w*(?:\.[a-z_]\w*)*)
- Group 1: a letter or underscore followed with zero or more word chars and then zero or more sequences of a.
and then a letter or underscore followed with zero or more word chars.
import re
text = 'results[0].items[0].packages[0].settings["compiler.version"] '
print(list(filter(None, re.findall(r'"[^"]*"|([a-z_]\w*(?:\.[a-z_]\w*)*)', text, re.ASCII | re.I))))
# => ['results', 'items', 'packages', 'settings']
The re.ASCII
option is used to make \w
match [a-zA-Z0-9_]
without accounting for Unicode chars.
Related Topics
Python Running as Windows Service: Oserror: [Winerror 6] the Handle Is Invalid
Strip/Trim All Strings of a Dataframe
Multiprocessing.Pool Makes Numpy Matrix Multiplication Slower
Plotting Multiple Lines, in Different Colors, with Pandas Dataframe
How Does the Max() Function Work on List of Strings in Python
Pd.Timestamp Versus Np.Datetime64: Are They Interchangeable for Selected Uses
Scale Matplotlib.Pyplot.Axes.Scatter Markersize by X-Scale
How to Increase Jupyter Notebook Memory Limit
Should I Be Adding the Django Migration Files in the .Gitignore File
Check If an Item Is in a Nested List
Converting Strings to Floats in a Dataframe
How to Obscure a Line Behind a Surface Plot in Matplotlib
Split Dataframe into Relatively Even Chunks According to Length
Pyspark Dataframes - Way to Enumerate Without Converting to Pandas
Django Unique Together (With Foreign Keys)