Find Multiple Quoted Words in a String with Regex

Find multiple quoted words in a string with regex

Your problem is in the use of lookarounds that do not consume text but check if their patterns match and return either true or false. See your regex in action, the , are matches because the last " in the previous match was not consumed, the regex index remained right after w, so the next match could start with ". You need to use a consuming pattern here, "([^"]*)".

However, your code will only return full matches. You can just trim the first and last "s here with .map {$0.trimmingCharacters(in: ["\""])}, as the regex only matches one quote at the start and end:

matches(for: "\"[^\"]*\"", in: str).map {$0.trimmingCharacters(in: ["\""])}

Here is the regex demo.

Alternatively, access Group 1 value by appending (at: 1) after $0.range:

func matches(for regex: String, in text: String) -> [String] {
do {
let regex = try NSRegularExpression(pattern: regex)
let results = regex.matches(in: text,
range: NSRange(text.startIndex..., in: text))
return results.map {
String(text[Range($0.range(at: 1), in: text)!])
}
} catch let error {
print("invalid regex: \(error.localizedDescription)")
return []
}
}

let str = "Hi \"how\", are \"you\""
print(matches(for: "\"([^\"]*)\"", in: str))
// => ["how", "you"]

RegEx: Grabbing values between quotation marks

I've been using the following with great success:

(["'])(?:(?=(\\?))\2.)*?\1

It supports nested quotes as well.

For those who want a deeper explanation of how this works, here's an explanation from user ephemient:

([""']) match a quote; ((?=(\\?))\2.) if backslash exists, gobble it, and whether or not that happens, match a character; *? match many times (non-greedily, as to not eat the closing quote); \1 match the same quote that was use for opening.

Regex match every string inside double quotes and include escaped quotation marks

Another option is a more optimal regex without | operator:





const str = String.raw`And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.`
const regex = /"[^"\\]*(?:\\[\s\S][^"\\]*)*"/g
console.log(str.match(regex))

regex multiple quotes selection

As long as you don't need to deal with escaped quotes, and as long as all your quotes are correctly balanced, you can make use of a negative lookahead assertion:

(['"])((?:(?!\1).)*)\1

or, in Java:

Pattern p1 = Pattern.compile("(['\"])((?:(?!\\1).)*)\\1");

Explanation:

(['"])   # Match any quote character, capture it in group 1
( # Match and capture in group 2:
(?: # Start of non-capturing group that matches...
(?!\1) # (as long as it's not the same quote character as in group 1)
. # ...any character
)* # any number of times.
) # End of capturing group 2
\1 # Match the same quote as before

Test it live on regex101.com.

Find a word between double quotes and replace it with suffix using regular expressions in notepad++

You can use

Find What: <TableName Value="[^"]*\K

Replace with: _test

Here, <TableName Value="[^"]*\K matches <TableName Value=", then zero or more chars other than " (with [^"]*) and then \K omits the text matched so far. Thus, the _test is added to the empty string just before the trailing ".

See demo screenshot:

Sample Image

Regex to find any double quoted string within a string

Don't escape the ". And just look for the text between quotes (in non greedy mode .*?) like:





var string = 'some text "string 1" and "another string" etc';


var pattern = /".*?"/g;


var current;

while(current = pattern.exec(string))

console.log(current);

Regex to match multiple words separated by commas and quotation marks

Just strip the unwanted characters to clean your string. This is much easier than capturing the whole structure and detecting the correct parts to keep within it.

Regular Expression

~[",\[\]]~

Replacement Pattern

~~

Example:

IN  -> tags = ["#scRNA-seq", "#single_cell", "#NGS", "#single_cell:method"]
OUT -> tags = [#scRNA-seq #single_cell #NGS #single_cell:method]

You can try a working demo here.

Regex to find more than two quotes between commas

To find correctly balanced quotes, search for ,"[^"]*", to find unexpected quotes search for ,"[^",]*("[^",]*)+",.

Note the commas within the square brackets for invalid quotes. That may be wrong, but if it is wrong then you would need stronger rules about the presence of commas.

To explain the regular expressions for valid and invalid. Both start and finish with ," and ",. That deals with the characters surrounding the [HERE] text shown in the question. The rest of both regular expressions handles the contents of the [HERE]. The valid case is zero or more characters that are not a quote. This is a simple match for [^"]. The invalid case has 1 or more quotes which can have other non-quote characters on either side. Invalid examples of the [HERE] include xx"xx and xxx"x"xxxxx"xx" and "xx""xx". All these invalid cases can be described as

  • zero or more characters that are not a quote, followed by
  • one or more sequences of characters that

    • start with a quote and then have
    • zero or more characters that are not a quote

In a regular expression a character that is not a quote is [^"]. Zero or more of them is [^"]*. A sequence of things is enclosed in brackets and one or more of a sequence is (...)+ or in this case ("[^"]*)+.

The question does not specify how commas with the [HERE] should be treated. This answer assumes that they are not allowed. It make that clear by adding a comma into the "not a quote" terms, giving [^",].

Assembling the pieces of the invalid match we get

,"                         // Opening characters
[^",] // Character that is neither quote nor comma
* // zero or more of them
( // Enclose the sequence
" // a real quote
[^",]* // Zero or more characters that are neither quote nor comma
) // End of the sequence
+ // one or more of the sequence
", // Closing characters


Related Topics



Leave a reply



Submit