Python Regex to Find a String in Double Quotes Within a String

Python Regex to find a string in double quotes within a string

Here's all you need to do:

def doit(text):      
import re
matches = re.findall(r'"(.+?)"',text)
# matches is now ['String 1', 'String 2', 'String3']
return ",".join(matches)

doit('Regex should return "String 1" or "String 2" or "String3" ')

result:

'String 1,String 2,String3'

As pointed out by Li-aung Yip:

To elaborate, .+? is the "non-greedy" version of .+. It makes the regular expression match the smallest number of characters it can instead of the most characters it can. The greedy version, .+, will give String 1" or "String 2" or "String 3; the non-greedy version .+? gives String 1, String 2, String 3.

In addition, if you want to accept empty strings, change .+ to .*. Star * means zero or more while plus + means at least one.

Extract a string between double quotes

Provided there are no nested quotes:

re.findall(r'"([^"]*)"', inputString)

Demo:

>>> import re
>>> inputString = 'According to some, dreams express "profound aspects of personality" (Foulkes 184), though others disagree.'
>>> re.findall(r'"([^"]*)"', inputString)
['profound aspects of personality']

RegEx: Grabbing values between quotation marks

I've been using the following with great success:

(["'])(?:(?=(\\?))\2.)*?\1

It supports nested quotes as well.

For those who want a deeper explanation of how this works, here's an explanation from user ephemient:

([""']) match a quote; ((?=(\\?))\2.) if backslash exists, gobble it, and whether or not that happens, match a character; *? match many times (non-greedily, as to not eat the closing quote); \1 match the same quote that was use for opening.

Extract text between quotation using regex python

If you want to extract some substring out of a string, you can go for re.search.

Demo:

import re

str_list = ['"abc"', '"ABC. XYZ"', '"1 - 2 - 3"']

for str in str_list:
search_str = re.search('"(.+?)"', str)
if search_str:
print(search_str.group(1))

Output:

abc
ABC. XYZ
1 - 2 - 3

How to match double quote in python regex?

  1. Those double-quotes in your regular expression are delimiting the string rather than part of the regular expression. If you want them to be part of the actual expression, you'll need to add more, and escape them with a backslash (r"\"\[.+\]\""). Alternatively, enclose the string in single quotes instead (r'"\[.+\]"').
  2. re.match() only produces a match if the expression is found at the beginning of the string. Since, in your example, there is a double quote character at the beginning of the string, and the regular expression doesn't include a double quote, it does not produce a match. Try re.search() or re.findall() instead.

Use python 3 regex to match a string in double quotes

In your regex

([\"'])[^\1]*\1

Character class is meant for matching only one character. So your use of [^\1] is incorrect. Think, what would have have happened if there were more than one characters in the first capturing group.

You can use negative lookahead like this

(["'])((?!\1).)*\1

or simply with alternation

(["'])(?:[^"'\\]+|\\.)*\1

or

(?<!\\)(["'])(?:[^"'\\]+|\\.)*\1

if you want to make sure "b\"ccc" does not matches in string bb\"b\"ccc"

find all substring wrapped in double quotes satisfying serveral constraints in python regular expression

Use a [^"]* negated character class after the first " to stay within double quoted substring (note - this will only work if there are no escape sequences in the string and get to the last http, then add it at the end, too, to get to the trailing ".

import re
pat = r'"[^"]*(http.*?\.(?:jpg|bmp))[^"]*"'
reg = re.compile(pat)
aa = '"http:afd/aa.bmp" :tt: "kkkk" ++, "http--test--http:kk/bb.jpg"'
print reg.findall(aa)
# => ['http:afd/aa.bmp', 'http:kk/bb.jpg']

See the Python demo online.

Pattern details:

  • " - a literal double quote
  • [^"]* - 0+ chars other than a double quote, as many as possible, since * is a greedy quantifier
  • (http.*?\.(?:jpg|bmp)) - Group 1 (extracted with re.findall) that matches:

    • http - a literal substring http
    • .*? - any 0+ chars, as few as possible (as *? is a lazy quantifier)
    • \. - a literal dot
    • (?:jpg|bmp) - a non-capturing group (so that the text it matches could not be output with re.findall) matching either jpg or bmp substring
  • [^"]* - 0+ chars other than a double quote, as many as possible
  • " - a literal double quote

Double quotes on a regex patter string causing failure of regular expression search

Although it is possible to fix the "rf'abc'" strings using eval(), this option has some serious security issues and should not be used.

A better solution is to fix these string at their source. The function pattern_gen() is returning these wrapped strings, and can be modified to return the strings directly:

def pattern_gen(x, y, z):
return rf'^(?:{y})*(?:{z})(?:{y})*$'

rx.rxpattern = pd.DataFrame(rx.apply(lambda x: pattern_gen(x['kw'], x['mand_kw'], x['kw']), axis=1))

Regex match all words except those between quotes

You can match strings between double quotes and then match and capture words optionally followed with dot separated words:

list(filter(None, re.findall(r'"[^"]*"|([a-z_]\w*(?:\.[a-z_]\w*)*)', text, re.ASCII | re.I)))

See the regex demo. Details:

  • "[^"]*" - a " char, zero or more chars other than " and then a " char
  • | - or
  • ([a-z_]\w*(?:\.[a-z_]\w*)*) - Group 1: a letter or underscore followed with zero or more word chars and then zero or more sequences of a . and then a letter or underscore followed with zero or more word chars.

See the Python demo:

import re
text = 'results[0].items[0].packages[0].settings["compiler.version"] '
print(list(filter(None, re.findall(r'"[^"]*"|([a-z_]\w*(?:\.[a-z_]\w*)*)', text, re.ASCII | re.I))))
# => ['results', 'items', 'packages', 'settings']

The re.ASCII option is used to make \w match [a-zA-Z0-9_] without accounting for Unicode chars.



Related Topics



Leave a reply



Submit