How to Split by Commas That Are Not Within Parentheses

How to split by commas that are not within parentheses?

Use a negative lookahead to match all the commas which are not inside the parenthesis. Splitting the input string according to the matched commas will give you the desired output.

,\s*(?![^()]*\))

DEMO

>>> import re
>>> s = "Water, Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
>>> re.split(r',\s*(?![^()]*\))', s)
['Water', 'Titanium Dioxide (CI 77897)', 'Black 2 (CI 77266)', 'Iron Oxides (CI 77491, 77492, 77499)', 'Ultramarines (CI 77007)']

Split string by comma, but ignore commas within brackets

This regex works on your example:

,(?=[^,]+?:)

Here, we use a positive lookahead to look for commas followed by non-comma and colon characters, then a colon. This correctly finds the <comma><key> pattern you are searching for. Of course, if the keys are allowed to have commas, this would have to be adapted a little further.

You can check out the regexr here

split by comma if comma not in between brackets while allowing characters to be outside the brackets with in the same comma split

You may use this regex with a lookahead for split:

>>> s = """aa,bb,(cc,dd),m(ee,ff)"""
>>> print ( re.split(r',(?![^()]*\))', s) )
['aa', 'bb', '(cc,dd)', 'm(ee,ff)']

RegEx Demo

RegEx Details:

  • ,: Match a comma
  • (?![^()]*\)): A negative lookahead assertion that makes sure we don't match comma inside (...) by asserting that there is no ) ahead after 0 or more not bracket characters.

Split string by comma if not within square brackets or parentheses

You can use the following regex with global flag.

,(?![^\(\[]*[\]\)])

Here is a demo.
It is inspired by https://stackoverflow.com/a/9030062/1630604.

Regex split by comma not inside parenthesis (.NET)

This PCRE regex - (\((?:[^()]++|(?1))*\))(*SKIP)(*F)|, - uses recursion, .NET does not support it, but there is a way to do the same thing using balancing construct. The From the PCRE verbs - (*SKIP) and (*FAIL) - only (*FAIL) can be written as (?!) (it causes an unconditional fail at the place where it stands), .NET does not support skipping a match at a specific position and resuming search from that failed position.

I suggest replacing all commas that are not inside nested parentheses with some temporary value, and then splitting the string with that value:

var s = Regex.Replace(text, @"\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\)|(,)", m =>
m.Groups[1].Success ? "___temp___" : m.Value);
var results = s.Split("___temp___");

Details

  • \((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\) - a pattern that matches nested parentheses:

    • \( - a ( char
    • (?>[^()]+|(?<o>)\(|(?<-o>)\))* - 0 or more occurrences of

      • [^()]+| - 1+ chars other than ( and ) or
      • (?<o>)\(| - a ( and a value is pushed on to the Group "o" stack
      • (?<-o>)\) - a ) and a value is popped from the Group "o" stack
    • (?(o)(?!)) - a conditional construct that fails the match if Group "o" stack is not empty
    • \) - a ) char
  • | - or
  • (,) - Group 1: a comma

Only the comma captured in Group 1 is replaced with a temp substring since the m.Groups[1].Success check is performed in the match evaluator part.

explode commas but ignore commas within brackets php

We can make a slight correction to your current regex splitting logic by using the following pattern:

,(?![^(]+\))

This says to split on comma, but only if that comma does not occur inside a terms in parentheses. It works by using a negative lookahead checking that we do not see a ) without first seeing an opening (, which would imply that the comma be inside a (...) term.

$string = "Beer - Domestic,Food - Snacks (chips,dips,nuts),Beer - Imported,UNCATEGORIZED";
$keywords = preg_split("/,(?![^(]+\))/", $string);
print_r($keywords);

This prints:

Array
(
[0] => Beer - Domestic
[1] => Food - Snacks (chips,dips,nuts)
[2] => Beer - Imported
[3] => UNCATEGORIZED
)

how to split string into array on commas but ignore commas in parentheses

You may use ,(?![^\(]*[\)]) with a list comprehension:

s = '''
a VARCHAR(20),
b FLOAT, c FLOAT,
d NUMBER(38,0), e NUMBER(38,0)
'''

[i.strip() for i in re.split(r',(?![^\(]*[\)])', s)]
# ['a VARCHAR(20)', 'b FLOAT', 'c FLOAT', 'd NUMBER(38,0)', 'e NUMBER(38,0)']


Related Topics



Leave a reply



Submit