Split a String by Commas But Ignore Commas Within Double-Quotes Using JavaScript

Split a string by commas but ignore commas within double-quotes using Javascript

Here's what I would do.

var str = 'a, b, c, "d, e, f", g, h';
var arr = str.match(/(".*?"|[^",\s]+)(?=\s*,|\s*$)/g);

Sample Image
/* will match:

    (
".*?" double quotes + anything but double quotes + double quotes
| OR
[^",\s]+ 1 or more characters excl. double quotes, comma or spaces of any kind
)
(?= FOLLOWED BY
\s*, 0 or more empty spaces and a comma
| OR
\s*$ 0 or more empty spaces and nothing else (end of string)
)

*/
arr = arr || [];
// this will prevent JS from throwing an error in
// the below loop when there are no matches
for (var i = 0; i < arr.length; i++) console.log('arr['+i+'] =',arr[i]);

Javascript/RegEx: Split a string by commas but ignore commas within double-quotes

You might take optional whitespace chars between 2 comma's if a lookbehind is supported.

"[^"]*"|[^\s,'"]+(?:\s+[^\s,'"]+)*|(?<=,)\s*(?=,)

Regex demo

const regex = /"[^"]*"|[^\s,'"]+(?:\s+[^\s,'"]+)*|(?<=,)\s*(?=,)/g;

[
`'a,b,c,d,e'`,
`'a,b,"c,d", e'`,
`'a,,"c,d", e'`,
` xz a,, b, c, "d, e, f", g, h`,
`'a, ,"c,d", e'`,
].forEach(s =>
console.log(s.match(regex))
)

How can I split by commas while ignoring any comma that's inside quotes?

Update:

I think the final version in a line should be:

var cells = (rows[i] + ',').split(/(?: *?([^",]+?) *?,|" *?(.+?)" *?,|( *?),)/).slice(1).reduce((a, b) => (a.length > 0 && a[a.length - 1].length < 4) ? [...a.slice(0, a.length - 1), [...a[a.length - 1], b]] : [...a, [b]], []).map(e => e.reduce((a, b) => a !== undefined ? a : b, undefined))

or put it more beautifully:

var cells = (rows[i] + ',')
.split(/(?: *?([^",]+?) *?,|" *?(.+?)" *?,|( *?),)/)
.slice(1)
.reduce(
(a, b) => (a.length > 0 && a[a.length - 1].length < 4)
? [...a.slice(0, a.length - 1), [...a[a.length - 1], b]]
: [...a, [b]],
[],
)
.map(
e => e.reduce(
(a, b) => a !== undefined ? a : b, undefined,
),
)
;

This is rather long, but still looks purely functional. Let me explain it:

First, the regular expression part. Basically, a segment you want may fall into 3 possibilities:

  1. *?([^",]+?) *?,, which is a string without " or , surrounded with spaces, followed by a ,.
  2. " *?(.+?)" *?,, which is a string, surrounded with a pair of quotes and an indefinite number of spaces beyond the quotes, followed by a ,.
  3. ( *?),, which is an indefinite number of spaces, followed by a ','.

So splitting by a non-capturing group of a union of these three will basically get us to the answer.

Recall that when splitting with a regular expression, the resulting array consists of:

  1. Strings separated by the separator (the regular expression)
  2. All the capturing groups in the separator

In our case, the separators fill the whole string, so the strings separated are all empty strings, except that last desired part, which is left out because there is no , following it. Thus the resulting array should be like:

  1. An empty string
  2. Three strings, representing the three capturing groups of the first separator matched
  3. An empty string
  4. Three strings, representing the three capturing groups of the second separator matched
  5. ...
  6. An empty string
  7. The last desired part, left alone

So why simply adding a , at the end so that we can get a perfect pattern? This is how (rows[i] + ',') comes about.

In this case the resulting array becomes capturing groups separated by empty strings. Removing the first empty string, they will appear in a group of 4 as [ 1st capturing group, 2nd capturing group, 3rd capturing group, empty string ].

What the reduce block does is exactly grouping them into groups of 4:

  .reduce(
(a, b) => (a.length > 0 && a[a.length - 1].length < 4)
? [...a.slice(0, a.length - 1), [...a[a.length - 1], b]]
: [...a, [b]],
[],
)

And finally, find the first non-undefined elements (an unmatched capturing group will appear as undefined. Our three patterns are exclusive in that any 2 of them cannot be matched simultaneously. So there is exactly 1 such element in each group) in each group which are precisely the desired parts:

  .map(
e => e.reduce(
(a, b) => a !== undefined ? a : b, undefined,
),
)

This completes the solution.


I think the following should suffice:

var cells = rows[i].split(/([^",]+?|".+?") *, */).filter(e => e)

or if you don't want the quotes:

var cells = rows[i].split(/(?:([^",]+?)|"(.+?)") *, */).filter(e => e)

Javascript: Splitting a string by comma but ignoring commas in quotes

> str.match(/('[^']+'|[^,]+)/g)
["A", "B", "C", "E", "'F,G,bb'", "H", "'I9,I8'", "J", "K"]

Though you requested this, you may not accounted for corner-cases where for example:

  • 'bob\'s' is a string where ' is escaped
  • a,',c
  • a,,b
  • a,b,
  • ,a,b
  • a,b,'
  • ',a,b
  • ',a,b,c,'

Some of the above are handled correctly by this; others are not. I highly recommend that people use a library that has thought this through, to avoid things such as security vulnerabilities or subtle bugs, now or in the future (if you expand your code, or if other people use it).


Explanation of the RegEx:

  • ('[^']+'|[^,]+) - means match either '[^']+' or [^,]+
  • '[^']+' means quote...one-or-more non-quotes...quote.
  • [^,]+ means one-or-more non-commas

Note: by consuming the quoted string before the unquoted string, we make the parsing of the unquoted string case easier.

Split based on commas but ignore commas within double-quotes

You mentioned you tried to split a 'string' variable. Therefor I assume you forgot to add the appropriate quotes. Is the following helpfull, assuming balanced double quotes?

import regex as re

line = """ "DATA", "LT", "0.40", "1.25", "Sentence, which contain,
commas", "401", "", "MN", "", "", "", "", "" """

l = re.findall(r'"([^"]*)"', line)

print(l)

Prints:

['DATA', 'LT', '0.40', '1.25', 'Sentence, which contain, \ncommas', '401', '', 'MN', '', '', '', '', '']

Split string on comma and ignore comma in double quotes

I think you can use the regex,(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$) from here: Splitting on comma outside quotes

You can test the pattern here: http://regexr.com/3cddl

Java code example:

public static void main(String[] args) {
String txt = "0, 2, 23131312,\"This, is a message\", 1212312";

System.out.println(Arrays.toString(txt.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)")));

}


Related Topics



Leave a reply



Submit