Regular Expression to split on spaces unless in quotes
No options required
Regex:
\w+|"[\w\s]*"
C#:
Regex regex = new Regex(@"\w+|""[\w\s]*""");
Or if you need to exclude " characters:
Regex
.Matches(input, @"(?<match>\w+)|\""(?<match>[\w\s]*)""")
.Cast<Match>()
.Select(m => m.Groups["match"].Value)
.ToList()
.ForEach(s => Console.WriteLine(s));
Regex for splitting a string using space when not surrounded by single or double quotes
I don't understand why all the others are proposing such complex regular expressions or such long code. Essentially, you want to grab two kinds of things from your string: sequences of characters that aren't spaces or quotes, and sequences of characters that begin and end with a quote, with no quotes in between, for two kinds of quotes. You can easily match those things with this regular expression:
[^\s"']+|"([^"]*)"|'([^']*)'
I added the capturing groups because you don't want the quotes in the list.
This Java code builds the list, adding the capturing group if it matched to exclude the quotes, and adding the overall regex match if the capturing group didn't match (an unquoted word was matched).
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"([^\"]*)\"|'([^']*)'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
if (regexMatcher.group(1) != null) {
// Add double-quoted string without the quotes
matchList.add(regexMatcher.group(1));
} else if (regexMatcher.group(2) != null) {
// Add single-quoted string without the quotes
matchList.add(regexMatcher.group(2));
} else {
// Add unquoted word
matchList.add(regexMatcher.group());
}
}
If you don't mind having the quotes in the returned list, you can use much simpler code:
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}
Split string by spaces but ignore spaces in quotation marks
I don't think what you want can be achieved through the use of String.prototype.split
alone, because its use will most likely lead to empty strings in the resulting array; and that's about the string you gave. If you need a general solution to your problem I believe split
won't work at all.
If your goal is to produce the same result irrespective of the actual string, I'd suggest you use a combination of String.prototype.match
, [].map
and String.prototype.replace
as shown:
Code:
var
/* The string. */
string = 'apples bananas "apples and bananas" pears "apples and bananas and pears"',
/* The regular expression. */
regex = /"[^"]+"|[^\s]+/g,
/* Use 'map' and 'replace' to discard the surrounding quotation marks. */
result = string.match(regex).map(e => e.replace(/"(.+)"/, "$1"));
console.log(result);
How to split on white spaces not between quotes?
\s(?=(?:[^'"`]*(['"`])[^'"`]*\1)*[^'"`]*$)
You can use this regex with lookahead
to split upon.See demo.
https://regex101.com/r/5I209k/4
or if mixed tick types.
https://regex101.com/r/5I209k/7
javascript split string by space, but ignore space in quotes (notice not to split by the colon too)
s = 'Time:"Last 7 Days" Time:"Last 30 Days"'
s.match(/(?:[^\s"]+|"[^"]*")+/g)
// -> ['Time:"Last 7 Days"', 'Time:"Last 30 Days"']
Explained:
(?: # non-capturing group
[^\s"]+ # anything that's not a space or a double-quote
| # or…
" # opening double-quote
[^"]* # …followed by zero or more chacacters that are not a double-quote
" # …closing double-quote
)+ # each match is one or more of the things described in the group
Turns out, to fix your original expression, you just need to add a +
on the group:
str.match(/(".*?"|[^"\s]+)+(?=\s*|\s*$)/g)
# ^ here.
Javascript split by spaces but not those in quotes
You could approach it slightly differently and use a Regular Expression to split where spaces are followed by word characters and a colon (rather than a space that's not in a quoted part):
var str = 'a:0 b:1 moo:"foo bar" c:2',
arr = str.split(/ +(?=[\w]+\:)/g);
/* [a:0, b:1, moo:"foo bar", c:2] */
Demo jsFiddle
What's this Regex doing?
It looks for a literal match on the space character, then uses a Positive Lookahead to assert that the next part can be matched:[\w]+
= match any word character [a-zA-Z0-9_] between one and unlimited times.\:
= match the :
character once (backslash escaped).g
= global modifier - don't return on first match.
Demo Regex101 (with explanation)
Regex for splitting String, using whitespace except if inside quotes and brackets
I think you can use the regex: \([^\)]+?[\)]|[\""].+?[\""]|[^ ]+
This is basically your regex with another alternative that considers a bracket in the beginning and matches everything until the closing bracket. The rest of the regex is the alternatives, which were defined by you before (e.g. matching the characters in quotation marks and words with space as a delimiter)
The demo can be seen here: https://regex101.com/r/L0sC4U/1
*Note that regex101 is a good source to understand regular expression (check the debug view and you can easily post examples here for future questions.
Regex for splitting a string using space when not surrounded by single or double quotes
I don't understand why all the others are proposing such complex regular expressions or such long code. Essentially, you want to grab two kinds of things from your string: sequences of characters that aren't spaces or quotes, and sequences of characters that begin and end with a quote, with no quotes in between, for two kinds of quotes. You can easily match those things with this regular expression:
[^\s"']+|"([^"]*)"|'([^']*)'
I added the capturing groups because you don't want the quotes in the list.
This Java code builds the list, adding the capturing group if it matched to exclude the quotes, and adding the overall regex match if the capturing group didn't match (an unquoted word was matched).
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"([^\"]*)\"|'([^']*)'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
if (regexMatcher.group(1) != null) {
// Add double-quoted string without the quotes
matchList.add(regexMatcher.group(1));
} else if (regexMatcher.group(2) != null) {
// Add single-quoted string without the quotes
matchList.add(regexMatcher.group(2));
} else {
// Add unquoted word
matchList.add(regexMatcher.group());
}
}
If you don't mind having the quotes in the returned list, you can use much simpler code:
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}
Split string on spaces except words in quotes
You can use:
$string = 'Some of "this string is" in quotes';
$arr = preg_split('/("[^"]*")|\h+/', $string, -1,
PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
print_r ( $arr );
Output:
Array
(
[0] => Some
[1] => of
[2] => "this string is"
[3] => in
[4] => quotes
)
RegEx Breakup
("[^"]*") # match quoted text and group it so that it can be used in output using
# PREG_SPLIT_DELIM_CAPTURE option
| # regex alteration
\h+ # match 1 or more horizontal whitespace
Split string on spaces except for in quotes, but include incomplete quotes
You can use
var re = /"([^"]*)"|\S+/g;
By using \S
(=[^\s]
) we just drop the "
from the negated character class.
By placing the "([^"]*)"
pattern before \S+
, we make sure substrings in quotes are not torn if they come before. This should work if the string contains well-paired quoted substrings and the last is unpaired.
Demo:
var re = /"([^"]*)"|\S+/g;
var str = 'sdfj "sdfjjk';
document.body.innerHTML = JSON.stringify(str.match(re));
Related Topics
Starting Tasks in Foreach Loop Uses Value of Last Item
How to Get the Currently-Logged Username from a Windows Service in .Net
How to Get Printer Info in .Net
Do C# Timers Elapse on a Separate Thread
Working Example of Createjobobject/Setinformationjobobject Pinvoke in .Net
How to Read and Write from the Serial Port
What Does the Tilde Before a Function Name Mean in C#
Task Sequencing and Re-Entracy
Count the Items from a Ienumerable<T> Without Iterating
Find If Current Time Falls in a Time Range
How to Access Backing Fields Behind Auto-Implemented Properties
Why Are Extension Methods Only Allowed in Non-Nested, Non-Generic Static Class
Kanji Characters from Webclient HTML Different from Actual Kanji in Website
How to Get Rendered HTML (Processed by JavaScript) in Webbrowser Control
How Does Native Implementation of Valuetype.Gethashcode Work