Matching Text Between a Pair of Single Quotes

regex match text in either single or double quote

This one seems to work:

(?:'|").*(?:'|")

or

((?:'|").*(?:'|"))

if you need a group.

Here's the demo: link

It works, because * is a greedy quantifier, so you don't have to know what kind of quote is in the end. * will take as much as possible.

RegEx: Grabbing values between quotation marks

I've been using the following with great success:

(["'])(?:(?=(\\?))\2.)*?\1

It supports nested quotes as well.

For those who want a deeper explanation of how this works, here's an explanation from user ephemient:

([""']) match a quote; ((?=(\\?))\2.) if backslash exists, gobble it, and whether or not that happens, match a character; *? match many times (non-greedily, as to not eat the closing quote); \1 match the same quote that was use for opening.

match strings between outer single quotes

Your regex needs a slight modification, capturing a group multiple times doesn't really work. What you really want is a group containing zero or more copies of your \\.|[^\'] expresssion. You can do this with a non capturing group which is written by adding ?: inside the opening parenthesis of the group. The full regex is then:

\'((?:\\.|[^\'])*)\'

You can try it out on regex101.

R regex get the text between single quotes

This should get what you want. The only assumption is that all of the strings you want between single quotes contain a colon (otherwise, how should we distinguish '01: ANTIG_CLIENTE <= 4' from ' when ANTIG_CLIENTE <= 8 then ', both of which are between single quotes?):

> regmatches(la,gregexpr("'[^']*:[^']*'",la))
[[1]]
[1] "'01: ANTIG_CLIENTE <= 4'" "'02: ANTIG_CLIENTE <= 8'" "'99: Error'"

Basically, we're trying to return all expressions (hence gregexpr instead of regexpr) of the form single quote, something besides single quote, colon, something besides single quote, single quote.

If you want to eliminate the single quotes in what is returned, you're going to need look-ahead and look-behind, which requires telling R to interpret your regex as perl:

> regmatches(la,gregexpr("(?<=')[^']*:[^']*(?=')",la,perl=T))
[[1]]
[1] "01: ANTIG_CLIENTE <= 4" "02: ANTIG_CLIENTE <= 8" "99: Error"

Regex to match single quotes being quoted by double-quotes

Well, here is a regex that works on all your samples - but it's a bit longer and not really perfectly readable. I hope I got all the escapes correctly for the java pattern.

(?:(?:^|\\G(?<!^)[^'\"]*\")[^\"]*+(?:"[^\"']*"[^\"]*)*+"|\\G(?<!^))[^'\"]*+(')

This makes use of the \G-matcher, that will match at the end of the last pattern and of possesive modifiers to avoid unnecessary backtracking.

Let's start at the end, [^'\"]*+(') matches any character, thats not single or double quote followed by a single quote, that is captured into a group.

\\G(?<!^) matches at the end of the last match (the (?<!^) is used to ensure we are not at the start of the string, as that is the position of \G in the first run, before anything is matched. So we will just try, if there is another single quote inside the double quotes we were in the last match.

(?:^|\\G(?<!^)[^'\"]*\")[^\"]*+(?:"[^\"']*"[^\"]*)*+" is used to jump over all sequences that are either outside double quotes or don't contain a single quote. ^|\\G(?<!^)[^'\"]*\" matches either the start of the string (first match) or matches until the closing double quote of our last match, if there is not other single quote inside. [^\"]*+ then matches anything that's not a double quote. (?:"[^\"']*"[^\"]*)*+" then matches any double quotes that don't contain single quotes and sequences outside single quotes until we reach the double quote that starts our matching for the single quote.

But I guess a demo shows it way better than I can explain, so here you are: https://regex101.com/r/tW5xH4/1

C# Regex: matching anything between single quotes (except single quotes)

Try:

= '([^']*)'

Meaning you want anything/everything from = ' that isn't a single quote up to a single quote.

Python example:

import re

text = "attribute = 'some value'"
match = re.search("= '([^']*)'", text)
print(match.group(1))

To read more about that, it is called a negated character class: https://www.regular-expressions.info/charclass.html



Related Topics



Leave a reply



Submit