PHP: split string on comma, but NOT when between braces or quotes?
Instead of a preg_split
, do a preg_match_all
:
$str = "AAA, BBB, (CCC,DDD), 'EEE', 'FFF,GGG', ('HHH','III'), (('JJJ','KKK'), LLL, (MMM,NNN)) , OOO";
preg_match_all("/\((?:[^()]|(?R))+\)|'[^']*'|[^(),\s]+/", $str, $matches);
print_r($matches);
will print:
Array
(
[0] => Array
(
[0] => AAA
[1] => BBB
[2] => (CCC,DDD)
[3] => 'EEE'
[4] => 'FFF,GGG'
[5] => ('HHH','III')
[6] => (('JJJ','KKK'), LLL, (MMM,NNN))
[7] => OOO
)
)
The regex \((?:[^()]|(?R))+\)|'[^']*'|[^(),\s]+
can be divided in three parts:
\((?:[^()]|(?R))+\)
, which matches balanced pairs of parenthesis'[^']*'
matching a quoted string[^(),\s]+
which matches any char-sequence not consisting of'('
,')'
,','
or white-space chars
PHP: Split a string by comma(,) but ignoring anything inside square brackets?
yeah, regex - select all commas, ignore in square brakets
/[,]+(?![^\[]*\])/g
https://regexr.com/3qudi
PHP and RegEx: Split a string by commas that are not inside brackets (and also nested brackets)
You can do that easier:
preg_match_all('/[^(,\s]+|\([^)]+\)/', $str, $matches)
But it would be better if you use a real parser. Maybe something like this:
$str = 'one, two, three, (four, (five, six), (ten)), seven';
$buffer = '';
$stack = array();
$depth = 0;
$len = strlen($str);
for ($i=0; $i<$len; $i++) {
$char = $str[$i];
switch ($char) {
case '(':
$depth++;
break;
case ',':
if (!$depth) {
if ($buffer !== '') {
$stack[] = $buffer;
$buffer = '';
}
continue 2;
}
break;
case ' ':
if (!$depth) {
continue 2;
}
break;
case ')':
if ($depth) {
$depth--;
} else {
$stack[] = $buffer.$char;
$buffer = '';
continue 2;
}
break;
}
$buffer .= $char;
}
if ($buffer !== '') {
$stack[] = $buffer;
}
var_dump($stack);
Split a string at comma character but ignore if said character is nested inside parentheses
You can use preg_split()
method for this (documentation). You can use this to split the string based on a regex pattern for comma separated values but ignored if these are between parentheses.
This code works for your example:
<?php
$string = 'v70, 790, v50 (v40, v44), v22';
$pattern = '/,(?![^(]*\)) /';
$splitString = preg_split($pattern, $string);
Output of $splitString
looks like:
array (size=4)
0 => string 'v70' (length=3)
1 => string '790' (length=3)
2 => string 'v50 (v40, v44)' (length=14)
3 => string 'v22' (length=3)
explode commas but ignore commas within brackets php
We can make a slight correction to your current regex splitting logic by using the following pattern:
,(?![^(]+\))
This says to split on comma, but only if that comma does not occur inside a terms in parentheses. It works by using a negative lookahead checking that we do not see a )
without first seeing an opening (
, which would imply that the comma be inside a (...)
term.
$string = "Beer - Domestic,Food - Snacks (chips,dips,nuts),Beer - Imported,UNCATEGORIZED";
$keywords = preg_split("/,(?![^(]+\))/", $string);
print_r($keywords);
This prints:
Array
(
[0] => Beer - Domestic
[1] => Food - Snacks (chips,dips,nuts)
[2] => Beer - Imported
[3] => UNCATEGORIZED
)
Exploding string by comma outside parentheses
Following the conversation here, I did write a parser to solve this problem. It is quite ugly, but it does the job (at least within some limitations). For completeness (if anybody else might run into the same question), I post it here:
function full($sqlu){
$sqlu = strtoupper($sqlu);
if(strpos($sqlu, "SELECT ")===false || strpos($sqlu, " FROM ")===false) return NULL;
$def = substr($sqlu, strpos($sqlu, "SELECT ")+7, strrpos($sqlu, " FROM ")-7);
$raw = explode(",", $def);
$elements = array();
$rem = array();
foreach($raw as $elm){
array_push($rem, $elm);
$txt = implode(",", $rem);
if(substr_count($txt, "(") - substr_count($txt, ")") == 0){
array_push($elements, $txt);
$rem = array();
}
}
return $elements;
}
When feeding it with the following string
SELECT first, second, to_char(my,(big, and, fancy),house) as bigly, (SELECT myVar,foo from z) as super, export(mastermind and others) as aloah FROM table
it returns
Array ( [0] => first [1] => second [2] => to_char(my,(big, and, fancy),house) as bigly [3] => (SELECT myVar,foo from z) as super [4] => export(mastermind and others) as aloah )
split by comma inside braces except another braces inside braces
Code: (PHP Demo)
$sqls = array(
"CREATE TABLE notes(id INTEGER,code DECIMAL (4,2),PRIMARY KEY (id))",
"CREATE TABLE notes(id INTEGER,code TEXT)"
);
foreach($sqls as $sql){
if(preg_match_all("/(?:^.+?\(|,)(?:\K[\w ]+(?:\([\S].*?\))?)/", $sql,$matches)){
echo "<pre>";
var_export($matches[0]);
echo "</pre>";
}
}
Output:
// first $matches...
array(
0 => 'id INTEGER',
1 => 'code DECIMAL (4,2)',
2 => 'PRIMARY KEY (id)'
)
// second $matches...
array(
0 => 'id INTEGER',
1 => 'code TEXT'
)
Regex Breakdown: (Regex Demo)
(?:^.+?\(|,) #group everything from the start to 1st parenthesis or a comma
(?:\K[\w ]+ #\K means "only retain text from this point", group words and spaces
(?:\([\S].*?\))? #optionally group parenthetical text
)
Using \K
permits the exclusion of a capture group and preg_match_all
returns the desired string (full string) in the first subarray. The benefit is a $matches
array that half the size of an array with a capture group.
Split a string by commas but ignore commas within double-quotes using Javascript
Here's what I would do.
var str = 'a, b, c, "d, e, f", g, h';
var arr = str.match(/(".*?"|[^",\s]+)(?=\s*,|\s*$)/g);
/* will match:
(
".*?" double quotes + anything but double quotes + double quotes
| OR
[^",\s]+ 1 or more characters excl. double quotes, comma or spaces of any kind
)
(?= FOLLOWED BY
\s*, 0 or more empty spaces and a comma
| OR
\s*$ 0 or more empty spaces and nothing else (end of string)
)
*/
arr = arr || [];
// this will prevent JS from throwing an error in
// the below loop when there are no matches
for (var i = 0; i < arr.length; i++) console.log('arr['+i+'] =',arr[i]);
regexp to split a string using comma(,) delimiter but ignore if the comma is in curly braces{,}
I see to possibilities (that don't crash with a long string):
The first with preg_match_all
:
$pattern = '~
(?:
\G(?!\A), # contigous to the previous match, not at the start of the string
| # OR
\A ,?? # at the start of the string or after the first match when
# it is empty
)\K # discard characters on the left from match result
[^{,]*+ # all that is not a { or a ,
(?:
{[^}]*}? [^{,]* # a string enclosed between curly brackets until a , or a {
# or an unclosed opening curly bracket until the end
)*+
~sx';
if (preg_match_all($pattern, $str, $m))
print_r($m[0]);
The second with preg_split
and backtracking control verbs to avoid parts enclosed between curly brackets (shorter, but less efficient with long strings):
$pattern = '~{[^}]*}?(*SKIP)(*F)|,~';
print_r(preg_split($pattern, $str));
(*F)
forces the pattern to fail and (*SKIP)
forces the regex engine to skip parts already matched when the pattern fails.
The weakness of this last approach is that the pattern starts with an alternation. This means that for each character that is not a {
or a ,
, the two branches of the alternation are tested (for nothing). However, you can improve the pattern with the S
(study) modifier:
$pattern = '~{[^}]*}?(*SKIP)(*F)|,~S';
or you can write it without an alternation, like this:
$pattern = '~[{,](?:(?<={)[^}]*}?(*SKIP)(*F))?~';
In this way, positions with a {
or ,
are searched before with a faster algorithm than the normal walk of the regex engine.
Regex - how to split string by commas, omitting commas in brackets
You can use this lookaround based regex:
$str = "myTemplate, testArr => [1868,1869,1870], testInteger => 3, testString => 'test, can contain a comma'";
$arr = preg_split("/\s*,\s*(?![^][]*\])(?=(?:(?:[^']*'){2})*[^']*$)/", $str);
print_r( $arr );
There are 2 lookarounds used in this regex:
(?![^][]*\])
- Asserts comma is not inside[...]
(?=(?:(?:[^']*'){2})*[^']*$)
- Asserts comma is not inside'...'
PS: This is assuming we don't have unbalanced/nested/escaped quotes and brackets.
RegEx Demo
Output:
Array
(
[0] => myTemplate
[1] => testArr => [1868,1869,1870]
[2] => testInteger => 3
[3] => testString => 'test, can contain a comma'
)
Related Topics
Tinymce & Fancybox - Editor Won't Work on Second View
How to Put a Translation System in PHP Website
Phpexcel Allowed Memory Size of 134217728 Bytes Exhausted
In Htaccess, I'D Like to Replace Underscores with Hyphens and Then Redirect the User the New Url
Ssl Error Can Not Change to Tls
Number_Format() Causes Error "A Non Well Formed Numeric Value Encountered"
Php, Why Do You Escape My Quotes
Getting Data from Post Array in Codeigniter
SQL Server Error 1934 Occurs on Insert to Table with Computed Column PHP/Pdo
Mysql_Fetch_Array Add All Rows
Display Total Customers Reviews and Ratings Average in Woocommerce
PHP Headers Already Sent Error
PHP 7 and Strict "Resource" Types
How to Use Date() in Doctrine 2 Dql
If Statement Within an Array Declaration ...Is That Possible