Split a String That Has White Spaces, Unless They Are Enclosed Within "Quotes"

Split a string that has white spaces, unless they are enclosed within quotes?

string input = "one \"two two\" three \"four four\" five six";
var parts = Regex.Matches(input, @"[\""].+?[\""]|[^ ]+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();

C++ Split a string by blank spaces unless it is enclosed in quotes and store in a vector

Here is a working example:

#include <string>
#include <vector>
#include <iostream>
using namespace std;
int main(void) {
string str = "12345 Hello World \"This is a group\"";
vector<string> v;
size_t i = 0, j = 0, begin = 0;
while(i < str.size()) {
if(str[i] == ' ' || i == 0) {
if(i + 1 < str.size() && str[i + 1] == '\"') {
j = begin + 1;
while(j < str.size() && str[j++] != '\"');
v.push_back(std::string(str, begin, j - 1 - i));
begin = j - 1;
i = j - 1;
continue;
}

j = begin + 1;
while(j < str.size() && str[j++] != ' ');
v.push_back(std::string(str, begin, j - 1 - i - (i ? 1 : 0) ));
begin = j;
}
++i;
}

for(auto& str: v)
cout << str << endl;
return 0;
}

Output:

12345
Hello
World
"This is a group"

However, notice that this code is for demonstration, since it doesn't handle all cases. For example, if yuo have onle double quote in your input, then this while(j < str.size() && str[j++] != '\"'); will case the whole string from that point to not be splitted.

Regular Expression to split on spaces unless in quotes

No options required

Regex:

\w+|"[\w\s]*"

C#:

Regex regex = new Regex(@"\w+|""[\w\s]*""");

Or if you need to exclude " characters:

    Regex
.Matches(input, @"(?<match>\w+)|\""(?<match>[\w\s]*)""")
.Cast<Match>()
.Select(m => m.Groups["match"].Value)
.ToList()
.ForEach(s => Console.WriteLine(s));

Splitting string on spaces unless in double quotes but double quotes can have a preceding string attached

We can do this using a formal pattern matcher. The secret sauce of the answer below is to use the not-much-used Matcher#appendReplacement method. We pause at each match, and then append a custom replacement of anything appearing inside two pairs of quotes. The custom method removeSpaces() strips all whitespace from each quoted term.

public static String removeSpaces(String input) {
return input.replaceAll("\\s+", "");
}

String input = "abc test=\"x y z\" magic=\" hello \" hola";
Pattern p = Pattern.compile("\"(.*?)\"");
Matcher m = p.matcher(input);
StringBuffer sb = new StringBuffer("");
while (m.find()) {
m.appendReplacement(sb, "\"" + removeSpaces(m.group(1)) + "\"");
}
m.appendTail(sb);

String[] parts = sb.toString().split("\\s+");
for (String part : parts) {
System.out.println(part);
}

abc
test="xyz"
magic="hello"
hola

Demo

The big caveat here, as the above comments hinted at, is that we are really using a regex engine as a rudimentary parser. To see where my solution would fail fast, just remove one of the quotes by accident from a quoted term. But, if you are sure you input is well formed as you have showed us, this answer might work for you.

How to split on white spaces not between quotes?

\s(?=(?:[^'"`]*(['"`])[^'"`]*\1)*[^'"`]*$)

You can use this regex with lookahead to split upon.See demo.

https://regex101.com/r/5I209k/4

or if mixed tick types.

https://regex101.com/r/5I209k/7

Split a string by spaces -- preserving quoted substrings -- in Python

You want split, from the built-in shlex module.

>>> import shlex
>>> shlex.split('this is "a test"')
['this', 'is', 'a test']

This should do exactly what you want.

If you want to preserve the quotation marks, then you can pass the posix=False kwarg.

>>> shlex.split('this is "a test"', posix=False)
['this', 'is', '"a test"']

Regex for splitting a string using space when not surrounded by single or double quotes

I don't understand why all the others are proposing such complex regular expressions or such long code. Essentially, you want to grab two kinds of things from your string: sequences of characters that aren't spaces or quotes, and sequences of characters that begin and end with a quote, with no quotes in between, for two kinds of quotes. You can easily match those things with this regular expression:

[^\s"']+|"([^"]*)"|'([^']*)'

I added the capturing groups because you don't want the quotes in the list.

This Java code builds the list, adding the capturing group if it matched to exclude the quotes, and adding the overall regex match if the capturing group didn't match (an unquoted word was matched).

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"([^\"]*)\"|'([^']*)'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
if (regexMatcher.group(1) != null) {
// Add double-quoted string without the quotes
matchList.add(regexMatcher.group(1));
} else if (regexMatcher.group(2) != null) {
// Add single-quoted string without the quotes
matchList.add(regexMatcher.group(2));
} else {
// Add unquoted word
matchList.add(regexMatcher.group());
}
}

If you don't mind having the quotes in the returned list, you can use much simpler code:

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}

javascript split string by space, but ignore space in quotes (notice not to split by the colon too)

s = 'Time:"Last 7 Days" Time:"Last 30 Days"'
s.match(/(?:[^\s"]+|"[^"]*")+/g)

// -> ['Time:"Last 7 Days"', 'Time:"Last 30 Days"']

Explained:

(?:         # non-capturing group
[^\s"]+ # anything that's not a space or a double-quote
| # or…
" # opening double-quote
[^"]* # …followed by zero or more chacacters that are not a double-quote
" # …closing double-quote
)+ # each match is one or more of the things described in the group

Turns out, to fix your original expression, you just need to add a + on the group:

str.match(/(".*?"|[^"\s]+)+(?=\s*|\s*$)/g)
# ^ here.


Related Topics



Leave a reply



Submit