Finding Quoted Strings with Escaped Quotes in C# Using a Regular Expression

Finding quoted strings with escaped quotes in C# using a regular expression

What you've got there is an example of Friedl's "unrolled loop" technique, but you seem to have some confusion about how to express it as a string literal. Here's how it should look to the regex compiler:

"[^"\\]*(?:\\.[^"\\]*)*"

The initial "[^"\\]* matches a quotation mark followed by zero or more of any characters other than quotation marks or backslashes. That part alone, along with the final ", will match a simple quoted string with no embedded escape sequences, like "this" or "".

If it does encounter a backslash, \\. consumes the backslash and whatever follows it, and [^"\\]* (again) consumes everything up to the next backslash or quotation mark. That part gets repeated as many times as necessary until an unescaped quotation mark turns up (or it reaches the end of the string and the match attempt fails).

Note that this will match "foo\"- in \"foo\"-"bar". That may seem to expose a flaw in the regex, but it doesn't; it's the input that's invalid. The goal was to match quoted strings, optionally containing backslash-escaped quotes, embedded in other text--why would there be escaped quotes outside of quoted strings? If you really need to support that, you have a much more complex problem, requiring a very different approach.

As I said, the above is how the regex should look to the regex compiler. But you're writing it in the form of a string literal, and those tend to treat certain characters specially--i.e., backslashes and quotation marks. Fortunately, C#'s verbatim strings save you the hassle of having to double-escape backslashes; you just have to escape each quotation mark with another quotation mark:

Regex r = new Regex(@"""[^""\\]*(?:\\.[^""\\]*)*""");

So the rule is double quotation marks for the C# compiler and double backslashes for the regex compiler--nice and easy. This particular regex may look a little awkward, with the three quotation marks at either end, but consider the alternative:

Regex r = new Regex("\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"");

In Java, you always have to write them that way. :-(

Regex that handles quoted strings and double quote for inches

Try this: (updated)

First use this expression to find and replace (in javascript) all the strings that are of the pattern "9" "9.9" "9-9" to the pattern "9' "9.9' "9-9'

\"[0-9.-]*\"

Next replace all

([^a-z,0-9,',"])([\s]*)\" 

with just a single ". This will remove all unwanted spaces.

Then take this new formatted string and apply

 \"[^\s]([^\"]*)[^\s]\"

This takes care of all the scenarios. Just ensure that you take the original string into a new variable and play with else you will end up modifying the original value.

Here is the sample string I used to test the above expressions. I did not have the time to write the javascript function itself. Please post the function if you get it to work using the above expressions.

8" "bosch grinder" , bosch "8" grinder" , and "bosch grinder " 8" "99" "9.9" "9-7"

A website I use to test out my regular expressions is http://www.regexr.com/

How to match string in quotes using Regex

If you read the text line by line, then the regex

"[^"]*"

will find all quoted strings, unless those may contain escaped quotes like "a 2\" by 4\" board".

To match those correctly, you need

"(?:\\.|[^"\\])*"

If you don't want the quotes to become part of the match, use lookaround assertions:

(?<=")[^"]*(?=")
(?<=")(?:\\.|[^"\\])*(?=")

These regexes, as C# regexes, can be created like this:

Regex regex1 = new Regex(@"(?<="")[^\""]*(?="")");
Regex regex2 = new Regex(@"(?<="")(?:\\.|[^""\\])*(?="")");

RegEx: Grabbing values between quotation marks

I've been using the following with great success:

(["'])(?:(?=(\\?))\2.)*?\1

It supports nested quotes as well.

For those who want a deeper explanation of how this works, here's an explanation from user ephemient:

([""']) match a quote; ((?=(\\?))\2.) if backslash exists, gobble it, and whether or not that happens, match a character; *? match many times (non-greedily, as to not eat the closing quote); \1 match the same quote that was use for opening.

Regex: Require that quotes are escaped in a string

In C#, this appears to work as you want:

string pattern = "^([^\"\\\\]*(\\\\.)?)*$";

Stripping out the escaping leaves you with:

^([^"\\]*(\\.)?)*$

which roughly translates into: start-of-string, (multi-chars-excluding-quote-or-backslash, optional-backslash-anychar)-repeated, end-of-string

It's the start-of-string and end-of-string markers which forces the match over the complete text.

Regex in C# - remove quotes and escaped quotes from a value after another value

    string serialized = JsonSerializer.Serialize(chartDefinition);
serialized = Regex.Replace(serialized, @"""function\(\)([^""\\]*(?:\\.[^""\\]*)*)""", "function()$1").Replace("\\\"", "\"");

Using Regex to match quoted string with embedded, non-escaped quotes

Since , is your delimiter, you can try changing your pattern like this. It should work.

string pattern = @"'(.*?)'(?:,|$)"; 

The way this works is, it looks for a single quote followed by a comma or end of the line.

Find quoted strings and replace content between double quotes

Let's try Regex.Replace in order to replace all the quotations (I've assumed that quotation is escaped by itself: "abc""def" -> abc"def) within the string:

  string source = "\"24.09.2019\",\"545\",\"878\",\"5\"";

int index = 0;

string result = Regex.Replace(source, "\"([^\"]|\"\")*\"", m => $"\"{{{++index}}}\"");

Demo:

  Func<string, string> convert = (source => {
int index = 0;

return Regex.Replace(source, "\"([^\"]|\"\")*\"", m => $"\"{{{++index}}}\"");
});

String[] tests = new string[] {
"abc",
"\"abc\", \"def\"\"fg\"",
"\"\"",
"\"24.09.2019\",\"545\",\"878\",\"5\"",
"name is \"my name\"; value is \"78\"\"\"\"\"",
"empty: \"\" and not empty: \"\"\"\""
};

string demo = string.Join(Environment.NewLine, tests
.Select(test => $"{test,-50} -> {convert(test)}"));

Console.Write(demo);

Outcome:

abc                                                -> abc
"abc", "def""fg" -> "{1}", "{2}"
"" -> "{1}"
"24.09.2019","545","878","5" -> "{1}","{2}","{3}","{4}"
name is "my name"; value is "78""""" -> name is "{1}"; value is "{2}"
empty: "" and not empty: """" -> empty: "{1}" and not empty: "{2}"

Edit: You can easily elaborate the replacement, e.g. if you want to replace integer numbers only

  Func<string, string> convert = (source => {
int index = 0;

// we have match "m" with index "index"
// out task is to provide a string which will be put instead of match
return Regex.Replace(
source,
"\"([^\"]|\"\")*\"",
m => int.TryParse(m.Value.Trim('"'), out int _drop)
? $"\"{{{++index}}}\"") // if match is a valid integer, replace it
: m.Value); // if not, keep intact
});

In general case

  Func<string, string> convert = (source => {
int index = 0;

// we have match "m" with index "index"
// out task is to provide a string which will be put instead of match
return Regex.Replace(
source,
"\"([^\"]|\"\")*\"",
m => {
// now we have a match "m", with its value "m.Value"
// its index "index"
// and we have to return a string which will be put instead of match

// if you want unquoted value, i.e. abc"def instead of "abc""def"
// string unquoted = Regex.Replace(
// m.Value, "\"+", match => new string('"', match.Value.Length / 2));

return //TODO: put the relevant code here
}
});


Related Topics



Leave a reply



Submit