Regex to Match All Instances Not Inside Quotes

Regex to match all instances not inside quotes

Actually, you can match all instances of a regex not inside quotes for any string, where each opening quote is closed again. Say, as in you example above, you want to match \+.

The key observation here is, that a word is outside quotes if there are an even number of quotes following it. This can be modeled as a look-ahead assertion:

\+(?=([^"]*"[^"]*")*[^"]*$)

Now, you'd like to not count escaped quotes. This gets a little more complicated. Instead of [^"]* , which advanced to the next quote, you need to consider backslashes as well and use [^"\\]*. After you arrive at either a backslash or a quote, you need to ignore the next character if you encounter a backslash, or else advance to the next unescaped quote. That looks like (\\.|"([^"\\]*\\.)*[^"\\]*"). Combined, you arrive at

\+(?=([^"\\]*(\\.|"([^"\\]*\\.)*[^"\\]*"))*[^"]*$)

I admit it is a little cryptic. =)

Regular expression: match word not between quotes

A regex solution below will work in most cases, but it might break if the unbalanced single quotes appear outside of string literals, e.g. in comments.

A usual regex trick to match strings in-context is matching what you need to replace and match and capture what you need to keep.

Here is a sample Python demo:

import re
rx = r"('[^'\\]*(?:\\.[^'\\]*)*')|\b{0}\b"
s = r"""
var foe = 10;
foe = "";
dark_vador = 'bad guy'
foe = ' I\'m your father, foe ! '
bar = thingy + foe"""
toReplace = "foe"
res = re.sub(rx.format(toReplace), lambda m: m.group(1) if m.group(1) else 'NEWORD', s)
print(res)

See the Python demo

The regex will look like

('[^'\\]*(?:\\.[^'\\]*)*')|\bfoe\b

See the regex demo.

The ('[^'\\]*(?:\\.[^'\\]*)*') part captures ingle-quoted string literals into Group 1 and if it matches, it is just put back into the result, and \bfoe\b matches whole words foe in any other string context - and subsequently is replaced with another word.

NOTE: To also match double quoted string literals, use r"('[^'\\]*(?:\\.[^'\\]*)*'|\"[^\"\\]*(?:\\.[^\"\\]*)*\")".

Regex find comma not inside quotes

Stand back and be amazed!


Here is the regex you seek:

(?!\B"[^"]*),(?![^"]*"\B)


Here is a demonstration:

regex101 demo


  • It does not match the second line because the " you inserted does not have a closing quotation mark.
  • It will not match values like so: ,r"a string",10 because the letter on the edge of the " will create a word boundary, rather than a non-word boundary.

Alternative version

(".*?,.*?"|.*?(?:,|$))

This will match the content and the commas and is compatible with values that are full of punctuation marks

regex101 demo

RegEx: Grabbing values between quotation marks

I've been using the following with great success:

(["'])(?:(?=(\\?))\2.)*?\1

It supports nested quotes as well.

For those who want a deeper explanation of how this works, here's an explanation from user ephemient:

([""']) match a quote; ((?=(\\?))\2.) if backslash exists, gobble it, and whether or not that happens, match a character; *? match many times (non-greedily, as to not eat the closing quote); \1 match the same quote that was use for opening.

Regex: split string by character except if inside quotes or double quotes

lookahead and lookbehind don't consume character so you can use multiple of them together. you can use

\=+(?=(?:(?:[^"]*"){2})*[^"]*$)(?=(?:(?:[^']*'){2})*[^']*$)(?=(?:(?:[^`]*`){2})*[^`]*$)

Regex Demo

Regex match every string inside double quotes and include escaped quotation marks

Another option is a more optimal regex without | operator:

const str = String.raw`And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.`
const regex = /"[^"\\]*(?:\\[\s\S][^"\\]*)*"/g
console.log(str.match(regex))

Regex - match all (quotes) except in a ...

Before anyone decides to implement this in production, look at this post. HTML and regex don't mix well, so please do not use this answer unless it's a quick hack that you're trying to do.

To replace all instances of " except for those inside the <a> tag, you can use the following. Of course, this assumes that the character > is invalid within the tag (<a param='>' href=""> breaks this for example).

Also, depends on your regex engine. This works in PCRE for example (among others), but you didn't specify a language, so I'm assuming anything goes.

See regex in use here

<a[^>]*>(*SKIP)(*FAIL)|"

It works as follows:

  • Match either of the following options
    • <a[^>]*>(*SKIP)(*FAIL) match the following
      • <a match this literally
      • [^>]* match any character except > any number of times
      • > match this character literally
      • (*SKIP)(*FAIL) magic - see this post for more info. Basically allows you to consume the characters, but then exclude them from the match.
    • " match this literally

We're effectively matching all " but skipping all the <a ... > tags in our matching pattern.



Related Topics



Leave a reply



Submit