Regex split by comma not inside parenthesis (.NET)
This PCRE regex - (\((?:[^()]++|(?1))*\))(*SKIP)(*F)|,
- uses recursion, .NET does not support it, but there is a way to do the same thing using balancing construct. The From the PCRE verbs - (*SKIP)
and (*FAIL)
- only (*FAIL)
can be written as (?!)
(it causes an unconditional fail at the place where it stands), .NET does not support skipping a match at a specific position and resuming search from that failed position.
I suggest replacing all commas that are not inside nested parentheses with some temporary value, and then splitting the string with that value:
var s = Regex.Replace(text, @"\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\)|(,)", m =>
m.Groups[1].Success ? "___temp___" : m.Value);
var results = s.Split("___temp___");
Details
\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\)
- a pattern that matches nested parentheses:\(
- a(
char(?>[^()]+|(?<o>)\(|(?<-o>)\))*
- 0 or more occurrences of[^()]+|
- 1+ chars other than(
and)
or(?<o>)\(|
- a(
and a value is pushed on to the Group "o" stack(?<-o>)\)
- a)
and a value is popped from the Group "o" stack
(?(o)(?!))
- a conditional construct that fails the match if Group "o" stack is not empty\)
- a)
char
|
- or(,)
- Group 1: a comma
Only the comma captured in Group 1 is replaced with a temp substring since the m.Groups[1].Success
check is performed in the match evaluator part.
How to split by commas that are not within parentheses?
Use a negative lookahead to match all the commas which are not inside the parenthesis. Splitting the input string according to the matched commas will give you the desired output.
,\s*(?![^()]*\))
DEMO
>>> import re
>>> s = "Water, Titanium Dioxide (CI 77897), Black 2 (CI 77266), Iron Oxides (CI 77491, 77492, 77499), Ultramarines (CI 77007)"
>>> re.split(r',\s*(?![^()]*\))', s)
['Water', 'Titanium Dioxide (CI 77897)', 'Black 2 (CI 77266)', 'Iron Oxides (CI 77491, 77492, 77499)', 'Ultramarines (CI 77007)']
How do I split a string by commas except inside parenthesis, using a regular expression?
To deal with nested parenthesis, you can use:
txt = "a,s(d,f(4,5)),g,h"
pattern = Regexp.new('((?:[^,(]+|(\((?>[^()]+|\g<-1>)*\)))+)')
puts txt.scan(pattern).map &:first
pattern details:
( # first capturing group
(?: # open a non capturing group
[^,(]+ # all characters except , and (
| # or
( # open the second capturing group
\( # (
(?> # open an atomic group
[^()]+ # all characters except parenthesis
| # OR
\g<-1> # the last capturing group (you can also write \g<2>)
)* # close the atomic group
\) # )
) # close the second capturing group
)+ # close the non-capturing group and repeat it
) # close the first capturing group
The second capturing group describe the nested parenthesis that can contain characters that are not parenthesis or the capturing group itself. It's a recursive pattern.
Inside the pattern, you can refer to a capture group with his number (\g<2>
for the second capturing group) or with his relative position (\g<-1>
the first on the left from the current position in the pattern) (or with his name if you use named capturing groups)
Notice: You can allow single parenthesis if you add |[()]
before the end of the non-capturing group. Then a,b(,c
will give you ['a', 'b(', 'c']
regular expression to split a string with comma outside parentheses with more than one level python
With PyPi regex module, you can use the code like
import regex
s = "eq(Firstname,test),eq(Lastname,ltest),OR(eq(ContactID,12345),eq(ContactID,123456))"
for x in regex.split(r"(\((?:[^()]++|(?1))*\))(*SKIP)(*F)|,", s):
if x is not None:
print( x )
Output:
eq(Firstname,test)
eq(Lastname,ltest)
OR(eq(ContactID,12345),eq(ContactID,123456))
See the Python and the regex demo.
Details:
(\((?:[^()]++|(?1))*\))
- Group 1 capturing a string between nested paired parentheses(*SKIP)(*F)
- the match is skipped and the next match is searched for from the failure position|
- or,
- a comma.
How to split string while ignoring portion in parentheses?
Instead of focusing on what you do not want it's often easier to express as a regular expression what you want, and to match
that with a global regex:
var str = "bibendum, morbi, non, quam (nec, dui, luctus), rutrum, nulla";
str.match(/[^,]+(?:\(+*?\))?/g) // the simple one
str.match(/[^,\s]+(?:\s+\([^)]*\))?/g) // not matching whitespaces
Split string by comma, but ignore commas within brackets
This regex works on your example:
,(?=[^,]+?:)
Here, we use a positive lookahead to look for commas followed by non-comma and colon characters, then a colon. This correctly finds the <comma><key>
pattern you are searching for. Of course, if the keys are allowed to have commas, this would have to be adapted a little further.
You can check out the regexr here
Java: splitting a comma-separated string but ignoring commas in parentheses
The simplest solution to my opinion is to process the input string char-by-char:
public static List<String> split(String input) {
int nParens = 0;
int start = 0;
List<String> result = new ArrayList<>();
for(int i=0; i<input.length(); i++) {
switch(input.charAt(i)) {
case ',':
if(nParens == 0) {
result.add(input.substring(start, i));
start = i+1;
}
break;
case '(':
nParens++;
break;
case ')':
nParens--;
if(nParens < 0)
throw new IllegalArgumentException("Unbalanced parenthesis at offset #"+i);
break;
}
}
if(nParens > 0)
throw new IllegalArgumentException("Missing closing parenthesis");
result.add(input.substring(start));
return result;
}
Example:
split("one,two,3,(4,five),six,(seven),(8,9,ten),eleven,(twelve,13,14,fifteen)") ->
[one, two, 3, (4,five), six, (seven), (8,9,ten), eleven, (twelve,13,14,fifteen)]
As a free bonus, this solution also counts nested parentheses if necessary:
split("one,two,3,(4,(five,six),seven),eight") ->
[one, two, 3, (4,(five,six),seven), eight]
Also it checks whether parentheses are balanced (every open parenthesis has the corresponding closing one).
Regex : Split on comma , but exclude commas within parentheses and quotes(Both single & Double)
You can use this regex:
String input = "5,(5,5),C'A,B','A,B',',B','A,',\"A,B\",C\"A,B\"";
String[] toks = input.split(
",(?=(([^']*'){2})*[^']*$)(?=(([^\"]*\"){2})*[^\"]*$)(?![^()]*\\))" );
for (String tok: toks)
System.out.printf("<%s>%n", tok);
Output:
<5>
<(5,5)>
<C'A,B'>
<'A,B'>
<',B'>
<'A,'>
<"A,B">
<C"A,B">
Explanation:
, # Match literal comma
(?=(([^']*'){2})*[^']*$) # Lookahead to ensure comma is followed by even number of '
(?=(([^"]*"){2})*[^"]*$) # Lookahead to ensure comma is followed by even number of "
(?![^()]*\\)) # Negative lookahead to ensure ) is not followed by matching
# all non [()] characters in between
Split string by commas except when in bracket
Try using pattern (?!\S\)|\()
Ex:
import re
b = ['hi, this(me,(you)) , hello(a,b)', 'hi, this(me,(you))']
for i in b:
print(re.split(r',(?!\S\)|\()', i))
Output:
['hi', ' this(me,(you)) ', ' hello(a,b)']
['hi', ' this(me,(you))']
Related Topics
Ruby Indented Multiline Strings
How to Count Existing Instances of a Class in Ruby
Is Systemexit a Special Kind of Exception
How to Use Reference Images in SASS When Using Rails 3.1
Can't Reindex Sunspot Solr - Error - Rsolr::Error::Http - 500 Internal Server Error
Ruby What Class Gets a Method When There Is No Explicit Receiver
Your Ruby Version Is 2.1.0, But Your Gemfile Specified 2.0.0
Method for Padding an Array in Ruby
Check Method Call on Model Using Minitest
Ruby Converting String Encoding from Iso-8859-1 to Utf-8 Not Working
Ruby: How to Find the Key of the Largest Value in a Hash
How Can Bundler/Gemfile Be Configured to Use Different Gem Sources During Development
Best Way to Handle Dynamic CSS in a Rails App
Can't Run Bundle Update on Windows
Wrapping Text into Lines at Word Boundaries
Rails 3 Cli Executes Commands Really Slow
What Are the Ruby Win32API Parameters | How to Pass a Null Pointer