How to Split a String by Commas Except Inside Parenthesis, Using a Regular Expression

Regex split by comma not inside parenthesis (.NET)

This PCRE regex - (\((?:[^()]++|(?1))*\))(*SKIP)(*F)|, - uses recursion, .NET does not support it, but there is a way to do the same thing using balancing construct. The From the PCRE verbs - (*SKIP) and (*FAIL) - only (*FAIL) can be written as (?!) (it causes an unconditional fail at the place where it stands), .NET does not support skipping a match at a specific position and resuming search from that failed position.

I suggest replacing all commas that are not inside nested parentheses with some temporary value, and then splitting the string with that value:

var s = Regex.Replace(text, @"\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\)|(,)", m =>
m.Groups[1].Success ? "___temp___" : m.Value);
var results = s.Split("___temp___");

Details

  • \((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\) - a pattern that matches nested parentheses:

    • \( - a ( char
    • (?>[^()]+|(?<o>)\(|(?<-o>)\))* - 0 or more occurrences of

      • [^()]+| - 1+ chars other than ( and ) or
      • (?<o>)\(| - a ( and a value is pushed on to the Group "o" stack
      • (?<-o>)\) - a ) and a value is popped from the Group "o" stack
    • (?(o)(?!)) - a conditional construct that fails the match if Group "o" stack is not empty
    • \) - a ) char
  • | - or
  • (,) - Group 1: a comma

Only the comma captured in Group 1 is replaced with a temp substring since the m.Groups[1].Success check is performed in the match evaluator part.

How to split by commas that are not within parentheses?

Use a negative lookahead to match all the commas which are not inside the parenthesis. Splitting the input string according to the matched commas will give you the desired output.

,\s*(?![^()]*\))

DEMO

>>> import re
>>> s = "Water, Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
>>> re.split(r',\s*(?![^()]*\))', s)
['Water', 'Titanium Dioxide (CI 77897)', 'Black 2 (CI 77266)', 'Iron Oxides (CI 77491, 77492, 77499)', 'Ultramarines (CI 77007)']

How do I split a string by commas except inside parenthesis, using a regular expression?

To deal with nested parenthesis, you can use:

txt = "a,s(d,f(4,5)),g,h"
pattern = Regexp.new('((?:[^,(]+|(\((?>[^()]+|\g<-1>)*\)))+)')
puts txt.scan(pattern).map &:first

pattern details:

(                        # first capturing group
(?: # open a non capturing group
[^,(]+ # all characters except , and (
| # or
( # open the second capturing group
\( # (
(?> # open an atomic group
[^()]+ # all characters except parenthesis
| # OR
\g<-1> # the last capturing group (you can also write \g<2>)
)* # close the atomic group
\) # )
) # close the second capturing group
)+ # close the non-capturing group and repeat it
) # close the first capturing group

The second capturing group describe the nested parenthesis that can contain characters that are not parenthesis or the capturing group itself. It's a recursive pattern.

Inside the pattern, you can refer to a capture group with his number (\g<2> for the second capturing group) or with his relative position (\g<-1> the first on the left from the current position in the pattern) (or with his name if you use named capturing groups)

Notice: You can allow single parenthesis if you add |[()] before the end of the non-capturing group. Then a,b(,c will give you ['a', 'b(', 'c']

regular expression to split a string with comma outside parentheses with more than one level python

With PyPi regex module, you can use the code like

import regex
s = "eq(Firstname,test),eq(Lastname,ltest),OR(eq(ContactID,12345),eq(ContactID,123456))"
for x in regex.split(r"(\((?:[^()]++|(?1))*\))(*SKIP)(*F)|,", s):
if x is not None:
print( x )

Output:

eq(Firstname,test)
eq(Lastname,ltest)
OR(eq(ContactID,12345),eq(ContactID,123456))

See the Python and the regex demo.

Details:

  • (\((?:[^()]++|(?1))*\)) - Group 1 capturing a string between nested paired parentheses
  • (*SKIP)(*F) - the match is skipped and the next match is searched for from the failure position
  • | - or
  • , - a comma.

How to split string while ignoring portion in parentheses?

Instead of focusing on what you do not want it's often easier to express as a regular expression what you want, and to match that with a global regex:

var str = "bibendum, morbi, non, quam (nec, dui, luctus), rutrum, nulla";
str.match(/[^,]+(?:\(+*?\))?/g) // the simple one
str.match(/[^,\s]+(?:\s+\([^)]*\))?/g) // not matching whitespaces

Split string by comma, but ignore commas within brackets

This regex works on your example:

,(?=[^,]+?:)

Here, we use a positive lookahead to look for commas followed by non-comma and colon characters, then a colon. This correctly finds the <comma><key> pattern you are searching for. Of course, if the keys are allowed to have commas, this would have to be adapted a little further.

You can check out the regexr here

Java: splitting a comma-separated string but ignoring commas in parentheses

The simplest solution to my opinion is to process the input string char-by-char:

public static List<String> split(String input) {
int nParens = 0;
int start = 0;
List<String> result = new ArrayList<>();
for(int i=0; i<input.length(); i++) {
switch(input.charAt(i)) {
case ',':
if(nParens == 0) {
result.add(input.substring(start, i));
start = i+1;
}
break;
case '(':
nParens++;
break;
case ')':
nParens--;
if(nParens < 0)
throw new IllegalArgumentException("Unbalanced parenthesis at offset #"+i);
break;
}
}
if(nParens > 0)
throw new IllegalArgumentException("Missing closing parenthesis");
result.add(input.substring(start));
return result;
}

Example:

split("one,two,3,(4,five),six,(seven),(8,9,ten),eleven,(twelve,13,14,fifteen)") ->
[one, two, 3, (4,five), six, (seven), (8,9,ten), eleven, (twelve,13,14,fifteen)]

As a free bonus, this solution also counts nested parentheses if necessary:

split("one,two,3,(4,(five,six),seven),eight") ->
[one, two, 3, (4,(five,six),seven), eight]

Also it checks whether parentheses are balanced (every open parenthesis has the corresponding closing one).

Regex : Split on comma , but exclude commas within parentheses and quotes(Both single & Double)

You can use this regex:

String input = "5,(5,5),C'A,B','A,B',',B','A,',\"A,B\",C\"A,B\"";
String[] toks = input.split(
",(?=(([^']*'){2})*[^']*$)(?=(([^\"]*\"){2})*[^\"]*$)(?![^()]*\\))" );
for (String tok: toks)
System.out.printf("<%s>%n", tok);

Output:

<5>
<(5,5)>
<C'A,B'>
<'A,B'>
<',B'>
<'A,'>
<"A,B">
<C"A,B">

Explanation:

,                         # Match literal comma
(?=(([^']*'){2})*[^']*$) # Lookahead to ensure comma is followed by even number of '
(?=(([^"]*"){2})*[^"]*$) # Lookahead to ensure comma is followed by even number of "
(?![^()]*\\)) # Negative lookahead to ensure ) is not followed by matching
# all non [()] characters in between

Split string by commas except when in bracket

Try using pattern (?!\S\)|\()

Ex:

import re

b = ['hi, this(me,(you)) , hello(a,b)', 'hi, this(me,(you))']
for i in b:
print(re.split(r',(?!\S\)|\()', i))

Output:

['hi', ' this(me,(you)) ', ' hello(a,b)']
['hi', ' this(me,(you))']


Related Topics



Leave a reply



Submit