Regular Expression to Split on Spaces Unless in Quotes

Regular Expression to split on spaces unless in quotes

No options required

Regex:

\w+|"[\w\s]*"

C#:

Regex regex = new Regex(@"\w+|""[\w\s]*""");

Or if you need to exclude " characters:

    Regex
.Matches(input, @"(?<match>\w+)|\""(?<match>[\w\s]*)""")
.Cast<Match>()
.Select(m => m.Groups["match"].Value)
.ToList()
.ForEach(s => Console.WriteLine(s));

Regex for splitting a string using space when not surrounded by single or double quotes

I don't understand why all the others are proposing such complex regular expressions or such long code. Essentially, you want to grab two kinds of things from your string: sequences of characters that aren't spaces or quotes, and sequences of characters that begin and end with a quote, with no quotes in between, for two kinds of quotes. You can easily match those things with this regular expression:

[^\s"']+|"([^"]*)"|'([^']*)'

I added the capturing groups because you don't want the quotes in the list.

This Java code builds the list, adding the capturing group if it matched to exclude the quotes, and adding the overall regex match if the capturing group didn't match (an unquoted word was matched).

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"([^\"]*)\"|'([^']*)'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
if (regexMatcher.group(1) != null) {
// Add double-quoted string without the quotes
matchList.add(regexMatcher.group(1));
} else if (regexMatcher.group(2) != null) {
// Add single-quoted string without the quotes
matchList.add(regexMatcher.group(2));
} else {
// Add unquoted word
matchList.add(regexMatcher.group());
}
}

If you don't mind having the quotes in the returned list, you can use much simpler code:

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}

Split string by spaces but ignore spaces in quotation marks

I don't think what you want can be achieved through the use of String.prototype.split alone, because its use will most likely lead to empty strings in the resulting array; and that's about the string you gave. If you need a general solution to your problem I believe split won't work at all.

If your goal is to produce the same result irrespective of the actual string, I'd suggest you use a combination of String.prototype.match, [].map and String.prototype.replace as shown:

Code:

var

/* The string. */

string = 'apples bananas "apples and bananas" pears "apples and bananas and pears"',

/* The regular expression. */

regex = /"[^"]+"|[^\s]+/g,

/* Use 'map' and 'replace' to discard the surrounding quotation marks. */

result = string.match(regex).map(e => e.replace(/"(.+)"/, "$1"));



console.log(result);

How to split on white spaces not between quotes?

\s(?=(?:[^'"`]*(['"`])[^'"`]*\1)*[^'"`]*$)

You can use this regex with lookahead to split upon.See demo.

https://regex101.com/r/5I209k/4

or if mixed tick types.

https://regex101.com/r/5I209k/7

javascript split string by space, but ignore space in quotes (notice not to split by the colon too)

s = 'Time:"Last 7 Days" Time:"Last 30 Days"'
s.match(/(?:[^\s"]+|"[^"]*")+/g)

// -> ['Time:"Last 7 Days"', 'Time:"Last 30 Days"']

Explained:

(?:         # non-capturing group
[^\s"]+ # anything that's not a space or a double-quote
| # or…
" # opening double-quote
[^"]* # …followed by zero or more chacacters that are not a double-quote
" # …closing double-quote
)+ # each match is one or more of the things described in the group

Turns out, to fix your original expression, you just need to add a + on the group:

str.match(/(".*?"|[^"\s]+)+(?=\s*|\s*$)/g)
# ^ here.

Javascript split by spaces but not those in quotes

You could approach it slightly differently and use a Regular Expression to split where spaces are followed by word characters and a colon (rather than a space that's not in a quoted part):

var str = 'a:0 b:1 moo:"foo bar" c:2',
arr = str.split(/ +(?=[\w]+\:)/g);
/* [a:0, b:1, moo:"foo bar", c:2] */

Demo jsFiddle

What's this Regex doing?

It looks for a literal match on the space character, then uses a Positive Lookahead to assert that the next part can be matched:

[\w]+ = match any word character [a-zA-Z0-9_] between one and unlimited times.

\: = match the : character once (backslash escaped).

g = global modifier - don't return on first match.

Demo Regex101 (with explanation)

Regex for splitting String, using whitespace except if inside quotes and brackets

I think you can use the regex: \([^\)]+?[\)]|[\""].+?[\""]|[^ ]+

This is basically your regex with another alternative that considers a bracket in the beginning and matches everything until the closing bracket. The rest of the regex is the alternatives, which were defined by you before (e.g. matching the characters in quotation marks and words with space as a delimiter)

The demo can be seen here: https://regex101.com/r/L0sC4U/1

*Note that regex101 is a good source to understand regular expression (check the debug view and you can easily post examples here for future questions.

Regex for splitting a string using space when not surrounded by single or double quotes

I don't understand why all the others are proposing such complex regular expressions or such long code. Essentially, you want to grab two kinds of things from your string: sequences of characters that aren't spaces or quotes, and sequences of characters that begin and end with a quote, with no quotes in between, for two kinds of quotes. You can easily match those things with this regular expression:

[^\s"']+|"([^"]*)"|'([^']*)'

I added the capturing groups because you don't want the quotes in the list.

This Java code builds the list, adding the capturing group if it matched to exclude the quotes, and adding the overall regex match if the capturing group didn't match (an unquoted word was matched).

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"([^\"]*)\"|'([^']*)'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
if (regexMatcher.group(1) != null) {
// Add double-quoted string without the quotes
matchList.add(regexMatcher.group(1));
} else if (regexMatcher.group(2) != null) {
// Add single-quoted string without the quotes
matchList.add(regexMatcher.group(2));
} else {
// Add unquoted word
matchList.add(regexMatcher.group());
}
}

If you don't mind having the quotes in the returned list, you can use much simpler code:

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}

Split string on spaces except words in quotes

You can use:

$string = 'Some of "this string is" in quotes';
$arr = preg_split('/("[^"]*")|\h+/', $string, -1,
PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
print_r ( $arr );

Output:

Array
(
[0] => Some
[1] => of
[2] => "this string is"
[3] => in
[4] => quotes
)

RegEx Breakup

("[^"]*")    # match quoted text and group it so that it can be used in output using
# PREG_SPLIT_DELIM_CAPTURE option
| # regex alteration
\h+ # match 1 or more horizontal whitespace

Split string on spaces except for in quotes, but include incomplete quotes

You can use

var re = /"([^"]*)"|\S+/g;

By using \S (=[^\s]) we just drop the " from the negated character class.
By placing the "([^"]*)" pattern before \S+, we make sure substrings in quotes are not torn if they come before. This should work if the string contains well-paired quoted substrings and the last is unpaired.

Demo:

var re = /"([^"]*)"|\S+/g; 

var str = 'sdfj "sdfjjk';

document.body.innerHTML = JSON.stringify(str.match(re));


Related Topics



Leave a reply



Submit