Regex Match Unescaped Quotes

Regex match unescaped quotes

You can use this:

(?<!\\)(?:\\{2})*\K"

(?<!\\) checks there is no backslash before (negative lookbehind)

(?:\\{2})* matches all even numbers of backslashes

\K removes all on the left from the match result (the backslashes here)

Regex non-escaped quotation marks

It seems to me you want to replace those unescaped quotes and to do that you do not need \K nor lookbehinds. Replace the lookbehind with a corresponding alternation group and capture what you need to restore with a capturing group and use a replacement backreference.

s.replace(/((?:^|[^\\])(?:\\{2})*)"/g, "$1'")

See the regex demo.

Details

  • ((?:^|[^\\])(?:\\{2})*) - Group 1 (its value can be accessed with $1 placeholder from the replacement pattern):

    • (?:^|[^\\]) - either start of the string or any char other than \
    • (?:\\{2})* - 0+ occurrences of double backslash
  • " - a double quote.

JS demo:

var rx = /((?:^|[^\\])(?:\\{2})*)"/g;var s = "hello\"there\\\"boo\\\\\\\\\"elephant";console.log("String:", s);console.log("Result:", s.replace(rx, "$1'"));

regex to match anything except an unescaped quote

Anything that is escaped has to be matched with an escape that is not itself

escaped.

(?<!\\)(?:\\\\)*\\some Character here

Furthermore, since escapes can be escaped, you have to match anything that

is escaped inside the quotes.

To that end, it is basically this form:

(?<!\\)(?:\\\\)*"[^\\"]*(?:\\[\S\s][^\\"]*)*"

see https://regex101.com/r/LRgBlQ/1

Note that the beginning part (?<!\\)(?:\\\\)* can be ommited if you

are taking care of (incorporating) the pre-quote part with another sub-expression.



 (?<! \\ )                  # Not an escape behind
(?: \\\\ )* # Optional even escapes
" # Open quote
[^\\"]* # Not an escape nor double quote
(?:
\\ [\S\s] [^\\"]* # Escape anything then more not escaped, etc ...
)*
" # Close quote

Match unescaped quotes in quoted csv

EDIT: Updated with regex from @sundance to avoid beginning of line and newline.

You could try substituting only quotes that aren't next to a comma, start of line, or newline:

import re

newline = re.sub(r'(?<!^)(?<!,)"(?!,|$)', '', line)

Regex to remove unescaped quotes from a CSV

The following solution only meets your current requirements and is not a universal solution to fix quotes in CSV:

(^"|"$|";+"|";\d+;")|"

Replace with $1 (or \1, depending on where you use this regex).

See the regex demo.

Details

  • (^"|"$|";+"|";\d+;") - Group 1:

    • ^"| - " at the start of the string, or
    • "$| - " at the end of the string, or
    • ";+"| - ", 1+ ; chars, and then ", or
    • ";\d+;" - ";, 1+ digits, then ;"
  • | - or
  • " - a " char.

Javascript Regex: count unescaped quotes in string

You need a small parser to deal with this task as there is no \G operator that could anchor the subsequent matches to the end of the previous successful match.

var s = "\"some text\" with 5 unescaped double quotes... \\\"extras\" \\some \\\"string \\\" right\" here \"";
var res = 0;var in_entity = false;for (var i=0; i<s.length; i++) { if ((s[i] === '\\' && !in_entity) || in_entity) { // reverse the flag in_entity = !in_entity; } else if (s[i] === '"' && !in_entity) { // an unescaped " res += 1; }}console.log(s,": ", res);

Regular expression to find unescaped double quotes in CSV file

Try this:

(?m)""(?![ \t]*(,|$))

Explanation:

(?m)       // enable multi-line matching (^ will act as the start of the line and $ will act as the end of the line (i))
"" // match two successive double quotes
(?! // start negative look ahead
[ \t]* // zero or more spaces or tabs
( // open group 1
, // match a comma
| // OR
$ // the end of the line or string
) // close group 1
) // stop negative look ahead

So, in plain English: "match two successive double quotes, only if they DON'T have a comma or end-of-the-line ahead of them with optionally spaces and tabs in between".

(i) besides being the normal start-of-the-string and end-of-the-string meta characters.



Related Topics



Leave a reply



Submit