Php: Split String on Comma, But Not When Between Braces or Quotes

PHP: split string on comma, but NOT when between braces or quotes?

Instead of a preg_split, do a preg_match_all:

$str = "AAA, BBB, (CCC,DDD), 'EEE', 'FFF,GGG', ('HHH','III'), (('JJJ','KKK'), LLL, (MMM,NNN)) , OOO"; 

preg_match_all("/\((?:[^()]|(?R))+\)|'[^']*'|[^(),\s]+/", $str, $matches);

print_r($matches);

will print:

Array
(
[0] => Array
(
[0] => AAA
[1] => BBB
[2] => (CCC,DDD)
[3] => 'EEE'
[4] => 'FFF,GGG'
[5] => ('HHH','III')
[6] => (('JJJ','KKK'), LLL, (MMM,NNN))
[7] => OOO
)

)

The regex \((?:[^()]|(?R))+\)|'[^']*'|[^(),\s]+ can be divided in three parts:

  1. \((?:[^()]|(?R))+\), which matches balanced pairs of parenthesis
  2. '[^']*' matching a quoted string
  3. [^(),\s]+ which matches any char-sequence not consisting of '(', ')', ',' or white-space chars

PHP: Split a string by comma(,) but ignoring anything inside square brackets?

yeah, regex - select all commas, ignore in square brakets

/[,]+(?![^\[]*\])/g

https://regexr.com/3qudi

PHP and RegEx: Split a string by commas that are not inside brackets (and also nested brackets)

You can do that easier:

preg_match_all('/[^(,\s]+|\([^)]+\)/', $str, $matches)

But it would be better if you use a real parser. Maybe something like this:

$str = 'one, two, three, (four, (five, six), (ten)), seven';
$buffer = '';
$stack = array();
$depth = 0;
$len = strlen($str);
for ($i=0; $i<$len; $i++) {
$char = $str[$i];
switch ($char) {
case '(':
$depth++;
break;
case ',':
if (!$depth) {
if ($buffer !== '') {
$stack[] = $buffer;
$buffer = '';
}
continue 2;
}
break;
case ' ':
if (!$depth) {
continue 2;
}
break;
case ')':
if ($depth) {
$depth--;
} else {
$stack[] = $buffer.$char;
$buffer = '';
continue 2;
}
break;
}
$buffer .= $char;
}
if ($buffer !== '') {
$stack[] = $buffer;
}
var_dump($stack);

Split a string at comma character but ignore if said character is nested inside parentheses

You can use preg_split() method for this (documentation). You can use this to split the string based on a regex pattern for comma separated values but ignored if these are between parentheses.

This code works for your example:

<?php

$string = 'v70, 790, v50 (v40, v44), v22';
$pattern = '/,(?![^(]*\)) /';
$splitString = preg_split($pattern, $string);

Output of $splitString looks like:

array (size=4)
0 => string 'v70' (length=3)
1 => string '790' (length=3)
2 => string 'v50 (v40, v44)' (length=14)
3 => string 'v22' (length=3)

explode commas but ignore commas within brackets php

We can make a slight correction to your current regex splitting logic by using the following pattern:

,(?![^(]+\))

This says to split on comma, but only if that comma does not occur inside a terms in parentheses. It works by using a negative lookahead checking that we do not see a ) without first seeing an opening (, which would imply that the comma be inside a (...) term.

$string = "Beer - Domestic,Food - Snacks (chips,dips,nuts),Beer - Imported,UNCATEGORIZED";
$keywords = preg_split("/,(?![^(]+\))/", $string);
print_r($keywords);

This prints:

Array
(
[0] => Beer - Domestic
[1] => Food - Snacks (chips,dips,nuts)
[2] => Beer - Imported
[3] => UNCATEGORIZED
)

Exploding string by comma outside parentheses

Following the conversation here, I did write a parser to solve this problem. It is quite ugly, but it does the job (at least within some limitations). For completeness (if anybody else might run into the same question), I post it here:

function full($sqlu){
$sqlu = strtoupper($sqlu);
if(strpos($sqlu, "SELECT ")===false || strpos($sqlu, " FROM ")===false) return NULL;
$def = substr($sqlu, strpos($sqlu, "SELECT ")+7, strrpos($sqlu, " FROM ")-7);
$raw = explode(",", $def);
$elements = array();
$rem = array();
foreach($raw as $elm){
array_push($rem, $elm);
$txt = implode(",", $rem);
if(substr_count($txt, "(") - substr_count($txt, ")") == 0){
array_push($elements, $txt);
$rem = array();
}
}
return $elements;
}

When feeding it with the following string

SELECT first, second, to_char(my,(big, and, fancy),house) as bigly, (SELECT myVar,foo from z) as super, export(mastermind and others) as aloah FROM table

it returns

Array ( [0] => first [1] => second [2] => to_char(my,(big, and, fancy),house) as bigly [3] => (SELECT myVar,foo from z) as super [4] => export(mastermind and others) as aloah ) 

split by comma inside braces except another braces inside braces

Code: (PHP Demo)

$sqls = array(
"CREATE TABLE notes(id INTEGER,code DECIMAL (4,2),PRIMARY KEY (id))",
"CREATE TABLE notes(id INTEGER,code TEXT)"
);

foreach($sqls as $sql){
if(preg_match_all("/(?:^.+?\(|,)(?:\K[\w ]+(?:\([\S].*?\))?)/", $sql,$matches)){
echo "<pre>";
var_export($matches[0]);
echo "</pre>";
}
}

Output:

// first $matches...
array(
0 => 'id INTEGER',
1 => 'code DECIMAL (4,2)',
2 => 'PRIMARY KEY (id)'
)
// second $matches...
array(
0 => 'id INTEGER',
1 => 'code TEXT'
)

Regex Breakdown: (Regex Demo)

(?:^.+?\(|,)          #group everything from the start to 1st parenthesis or a comma
(?:\K[\w ]+ #\K means "only retain text from this point", group words and spaces
(?:\([\S].*?\))? #optionally group parenthetical text
)

Using \K permits the exclusion of a capture group and preg_match_all returns the desired string (full string) in the first subarray. The benefit is a $matches array that half the size of an array with a capture group.

Split a string by commas but ignore commas within double-quotes using Javascript

Here's what I would do.

var str = 'a, b, c, "d, e, f", g, h';
var arr = str.match(/(".*?"|[^",\s]+)(?=\s*,|\s*$)/g);

Sample Image
/* will match:

    (
".*?" double quotes + anything but double quotes + double quotes
| OR
[^",\s]+ 1 or more characters excl. double quotes, comma or spaces of any kind
)
(?= FOLLOWED BY
\s*, 0 or more empty spaces and a comma
| OR
\s*$ 0 or more empty spaces and nothing else (end of string)
)

*/
arr = arr || [];
// this will prevent JS from throwing an error in
// the below loop when there are no matches
for (var i = 0; i < arr.length; i++) console.log('arr['+i+'] =',arr[i]);

regexp to split a string using comma(,) delimiter but ignore if the comma is in curly braces{,}

I see to possibilities (that don't crash with a long string):

The first with preg_match_all:

$pattern = '~
(?:
\G(?!\A), # contigous to the previous match, not at the start of the string
| # OR
\A ,?? # at the start of the string or after the first match when
# it is empty
)\K # discard characters on the left from match result
[^{,]*+ # all that is not a { or a ,
(?:
{[^}]*}? [^{,]* # a string enclosed between curly brackets until a , or a {
# or an unclosed opening curly bracket until the end
)*+
~sx';

if (preg_match_all($pattern, $str, $m))
print_r($m[0]);

The second with preg_split and backtracking control verbs to avoid parts enclosed between curly brackets (shorter, but less efficient with long strings):

$pattern = '~{[^}]*}?(*SKIP)(*F)|,~';
print_r(preg_split($pattern, $str));

(*F) forces the pattern to fail and (*SKIP) forces the regex engine to skip parts already matched when the pattern fails.

The weakness of this last approach is that the pattern starts with an alternation. This means that for each character that is not a { or a ,, the two branches of the alternation are tested (for nothing). However, you can improve the pattern with the S (study) modifier:

$pattern = '~{[^}]*}?(*SKIP)(*F)|,~S';

or you can write it without an alternation, like this:

$pattern = '~[{,](?:(?<={)[^}]*}?(*SKIP)(*F))?~';

In this way, positions with a { or , are searched before with a faster algorithm than the normal walk of the regex engine.

Regex - how to split string by commas, omitting commas in brackets

You can use this lookaround based regex:

$str = "myTemplate, testArr => [1868,1869,1870], testInteger => 3, testString => 'test, can contain a comma'";

$arr = preg_split("/\s*,\s*(?![^][]*\])(?=(?:(?:[^']*'){2})*[^']*$)/", $str);

print_r( $arr );

There are 2 lookarounds used in this regex:

  • (?![^][]*\]) - Asserts comma is not inside [...]
  • (?=(?:(?:[^']*'){2})*[^']*$) - Asserts comma is not inside '...'

PS: This is assuming we don't have unbalanced/nested/escaped quotes and brackets.

RegEx Demo

Output:

Array
(
[0] => myTemplate
[1] => testArr => [1868,1869,1870]
[2] => testInteger => 3
[3] => testString => 'test, can contain a comma'
)


Related Topics



Leave a reply



Submit