Regex to match all instances not inside quotes
Actually, you can match all instances of a regex not inside quotes for any string, where each opening quote is closed again. Say, as in you example above, you want to match \+
.
The key observation here is, that a word is outside quotes if there are an even number of quotes following it. This can be modeled as a look-ahead assertion:
\+(?=([^"]*"[^"]*")*[^"]*$)
Now, you'd like to not count escaped quotes. This gets a little more complicated. Instead of [^"]*
, which advanced to the next quote, you need to consider backslashes as well and use [^"\\]*
. After you arrive at either a backslash or a quote, you need to ignore the next character if you encounter a backslash, or else advance to the next unescaped quote. That looks like (\\.|"([^"\\]*\\.)*[^"\\]*")
. Combined, you arrive at
\+(?=([^"\\]*(\\.|"([^"\\]*\\.)*[^"\\]*"))*[^"]*$)
I admit it is a little cryptic. =)
Regular expression: match word not between quotes
A regex solution below will work in most cases, but it might break if the unbalanced single quotes appear outside of string literals, e.g. in comments.
A usual regex trick to match strings in-context is matching what you need to replace and match and capture what you need to keep.
Here is a sample Python demo:
import re
rx = r"('[^'\\]*(?:\\.[^'\\]*)*')|\b{0}\b"
s = r"""
var foe = 10;
foe = "";
dark_vador = 'bad guy'
foe = ' I\'m your father, foe ! '
bar = thingy + foe"""
toReplace = "foe"
res = re.sub(rx.format(toReplace), lambda m: m.group(1) if m.group(1) else 'NEWORD', s)
print(res)
See the Python demo
The regex will look like
('[^'\\]*(?:\\.[^'\\]*)*')|\bfoe\b
See the regex demo.
The ('[^'\\]*(?:\\.[^'\\]*)*')
part captures ingle-quoted string literals into Group 1 and if it matches, it is just put back into the result, and \bfoe\b
matches whole words foe
in any other string context - and subsequently is replaced with another word.
NOTE: To also match double quoted string literals, use r"('[^'\\]*(?:\\.[^'\\]*)*'|\"[^\"\\]*(?:\\.[^\"\\]*)*\")"
.
Regex find comma not inside quotes
Stand back and be amazed!
Here is the regex you seek:
(?!\B"[^"]*),(?![^"]*"\B)
Here is a demonstration:
regex101 demo
- It does not match the second line because the
"
you inserted does not have a closing quotation mark. - It will not match values like so:
,r"a string",10
because the letter on the edge of the"
will create a word boundary, rather than a non-word boundary.
Alternative version
(".*?,.*?"|.*?(?:,|$))
This will match the content and the commas and is compatible with values that are full of punctuation marks
regex101 demo
RegEx: Grabbing values between quotation marks
I've been using the following with great success:
(["'])(?:(?=(\\?))\2.)*?\1
It supports nested quotes as well.
For those who want a deeper explanation of how this works, here's an explanation from user ephemient:
([""'])
match a quote;((?=(\\?))\2.)
if backslash exists, gobble it, and whether or not that happens, match a character;*?
match many times (non-greedily, as to not eat the closing quote);\1
match the same quote that was use for opening.
Regex: split string by character except if inside quotes or double quotes
lookahead and lookbehind don't consume character so you can use multiple of them together. you can use
\=+(?=(?:(?:[^"]*"){2})*[^"]*$)(?=(?:(?:[^']*'){2})*[^']*$)(?=(?:(?:[^`]*`){2})*[^`]*$)
Regex Demo
Regex match every string inside double quotes and include escaped quotation marks
Another option is a more optimal regex without |
operator:
const str = String.raw`And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.`
const regex = /"[^"\\]*(?:\\[\s\S][^"\\]*)*"/g
console.log(str.match(regex))
Regex - match all (quotes) except in a ...
Before anyone decides to implement this in production, look at this post. HTML and regex don't mix well, so please do not use this answer unless it's a quick hack that you're trying to do.
To replace all instances of "
except for those inside the <a>
tag, you can use the following. Of course, this assumes that the character >
is invalid within the tag (<a param='>' href="">
breaks this for example).
Also, depends on your regex engine. This works in PCRE for example (among others), but you didn't specify a language, so I'm assuming anything goes.
See regex in use here
<a[^>]*>(*SKIP)(*FAIL)|"
It works as follows:
- Match either of the following options
<a[^>]*>(*SKIP)(*FAIL)
match the following<a
match this literally[^>]*
match any character except>
any number of times>
match this character literally(*SKIP)(*FAIL)
magic - see this post for more info. Basically allows you to consume the characters, but then exclude them from the match.
"
match this literally
We're effectively matching all "
but skipping all the <a ... >
tags in our matching pattern.
Related Topics
React Useeffect in Depth/Use of Useeffect
What Is the Correct Way to Check for String Equality in JavaScript
How to Make JavaScript Object Using a Variable String to Define the Class Name
When Is the Comma Operator Useful
How to Make a Simple Image Upload Using JavaScript/Html
Use Functions Defined in Es6 Module Directly in HTML
Load HTML File Contents to Div [Without the Use of Iframes]
How to Clear Cache of Service Worker
Jquery - How to Select by Attribute
Html5 Audio Tag on Safari Has a Delay
Implementing Pagination in Mongodb
What Are the Differences Between Deferred, Promise and Future in JavaScript
Javascript: What Dangers Are in Extending Array.Prototype
Date Constructor Returns Nan in Ie, But Works in Firefox and Chrome
How to Use Window.Postmessage Across Domains
How to Use Greasemonkey to Selectively Remove Content from a Website
How to Read Xml File Contents in Jquery and Display in HTML Elements