How to Include the Split Delimiter in Results for Preg_Split()

How do I include the split delimiter in results for preg_split()?

Here you go:

preg_split('/([^.:!?]+[.:!?]+)/', 'good:news.everyone!', -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

How it works: The pattern actually turns everything into a delimiter. Then, to include these delimiters in the array, you can use the PREG_SPLIT_DELIM_CAPTURE constant. This will return an array like:

array (
0 => '',
1 => 'good:',
2 => '',
3 => 'news.',
4 => '',
5 => 'everyone!',
6 => '',
);

To get rid of the empty values, use PREG_SPLIT_NO_EMPTY. To combine two or more of these constants, we use the bitwise | operator. The result:

array (
0 => 'good:',
1 => 'news.',
2 => 'everyone!'
);

preg_split - split by white space and by chosen character but keep the character in array

the problem is that i want to keep that comma in array

Then just use the flag PREG_SPLIT_DELIM_CAPTURE

PREG_SPLIT_DELIM_CAPTURE

If this flag is set, parenthesized expression in the delimiter pattern will be captured and returned as well.

http://php.net/manual/en/function.preg-split.php

So you will split it like this

$split = preg_split('/(,)\s|\s/', $string, null, PREG_SPLIT_DELIM_CAPTURE);

You can test it here

https://3v4l.org/Eq8uS

For the Limit argument null is more appropriate then -1 because we just want to skip to the flag argument. It's more clean when you read it because null means nothing where -1 may have some important value (in this case it doesn't) but it just makes it clearer for someone that doesn't know preg_split as well that we are just ignoring that argument.

I am trying to split/explode/preg_split a string but I want to keep the delimiter

You can use preg_match_all like so:

$matches = array();
preg_match_all('/(\/block\/[0-9]+\/page\/[0-9]+)/', '/block/2/page/2/block/3/page/4', $matches);
var_dump( $matches[0]);

Output:

array(2) {
[0]=>
string(15) "/block/2/page/2"
[1]=>
string(15) "/block/3/page/4"
}

Demo

Edit: This is the best I could do with preg_split.

$array = preg_split('#(/block/)#', '/block/2/page/2/block/3/page/4', -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

$result = array();
for( $i = 0, $count = count( $array); $i < $count; $i += 2)
{
$result[] = $array[$i] . $array[$i + 1];
}

It's not worth the overhead to use a regular expression if you still need to loop to prepend the delimiter. Just use explode and prepend the delimiter yourself:

$delimiter = '/block/'; $results = array();
foreach( explode( $delimiter, '/block/2/page/2/block/3/page/4') as $entry)
{
if( !empty( $entry))
{
$results[] = $delimiter . $entry;
}
}

Demo

Final Edit: Solved! Here is the solution using one regex, preg_split, and PREG_SPLIT_DELIM_CAPTURE

$regex = '#(/block/(?:\w+/?)+(?=/block/))#';
$flags = PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY;
preg_split( $regex, '/block/2/page/2/block/3/page/4', -1, $flags);
preg_split( $regex, '/block/2/page/2/order/title/sort/asc/block/3/page/4', -1, $flags);

Output:

array(2) {
[0]=>
string(15) "/block/2/page/2"
[1]=>
string(15) "/block/3/page/4"
}
array(2) {
[0]=>
string(36) "/block/2/page/2/order/title/sort/asc"
[1]=>
string(15) "/block/3/page/4"
}

Final Demo

Regex (preg_split): how do I split based on a delimiter, excluding delimiters included in a pair of quotes?

You can use the following.

$text = '1 2 3 4/5/6 "7/8 9" 10';
$results = preg_split('~"[^"]*"(*SKIP)(*F)|[ /]+~', $text);
print_r($results);

Explanation:

On the left side of the alternation operator we match anything in quotations making the subpattern fail, forcing the regular expression engine to not retry the substring using backtracking control with (*SKIP) and (*F). The right side of the alternation operator matches either a space character or a forward slash not in quotations.

Output

Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
[5] => 6
[6] => "7/8 9"
[7] => 10
)

Split a string just before each occurrence of 3 specific delimiters

Try this:

$ar = preg_split('/(\$[^#]+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

preg_split : splitting a string according to a very specific pattern

Here's an attempt with preg_match:

$pattern = "/^([^\[]+)\[([^\]]+)\]\s+\(([^,]+),\s+([^,]+),\s+([^,]+),\s+([^,]+)\)\s+(.+)$/i";
$string = "CADAVRES [FILM] (Canada : Québec, Érik Canuel, 2009, long métrage) FICTION";
preg_match($pattern, $string, $keywords);
array_shift($keywords);
print_r($keywords);

Output:

Array
(
[0] => CADAVRES
[1] => FILM
[2] => Canada : Québec
[3] => Érik Canuel
[4] => 2009
[5] => long métrage
[6] => FICTION
)

Try it!

Regex breakdown:

^   anchor to start of string
( begin capture group 1
[^\[]+ one or more non-left bracket characters
) end capture group 1
\[ literal left bracket
( begin capture group 2
[^\]]+ one or more non-right bracket characters
) end capture group 2
\] literal bracket
\s+ one or more spaces
\( literal open parenthesis
( open capture group 3
[^,]+ one or more non-comma characters
) end capture group 3
,\s+ literal comma followed by one or more spaces
([^,]+),\s+([^,]+),\s+([^,]+) repeats of the above
\) literal closing parenthesis
\s+ one or more spaces
( begin capture group 7
.+ everything else
) end capture group 7
$ EOL

This assumes your structure to be static and is not particularly pretty, but on the other hand, should be robust to delimiters creeping into fields where they're not supposed to be. For example, the title having a : or , in it seems plausible and would break a "split on these delimiters anywhere"-type solution. For example,

"Matrix:, Trilogy()   [FILM, reviewed: good]    (Canada() :   Québec  ,  \t Érik Canuel , ): 2009 ,   long ():():[][]métrage) FICTIO  , [(:N";

correctly parses as:

Array
(
[0] => Matrix:, Trilogy()
[1] => FILM, reviewed: good
[2] => Canada() : Québec
[3] => Érik Canuel
[4] => ): 2009
[5] => long ():():[][]métrage
[6] => FICTIO , [(:N
)

Try it!

Additionally, if your parenthesized comma region is variable length, you might want to extract that first and parse it, then handle the rest of the string.

php preg_split(). Not the right pattern and converts period to comma?

The biggest part of the solution is from @Rarst in this post.

I've ended up with this code:

function calcCssNewValue( $string, $scale ){

$stringArray = preg_split( '/([a-zA-Z]+)/', $string, 2, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY );

$returnVal = number_format( ( $stringArray[0] * $scale ), 2, '.', ''); ;

if ( count( $stringArray ) > 1 ) {

$returnVal .= $stringArray[1];
}

return $returnVal;

}

The $stringArray is by @Rarst.

The $returnVal I've added number_format(). And forced the decimal point to an actual point. Somehow, and I don't know why, it changed the decimal point to a comma. But only when doing math...

use preg_split but keep delimiter

Use a zero-width assertion (a lookbehind here):

$result = preg_split('~(?<=\.)\s~', $text, -1, PREG_SPLIT_NO_EMPTY);

or you can use the \K feature that removes all on the left from the whole match:

$result = preg_split('~\.\K\s~', $text, -1, PREG_SPLIT_NO_EMPTY);

Without regex (if whitespaces are only spaces, and if the last dot is not followed by a space):

$chunks = explode('. ', $text);
$last = array_pop($chunks);
$result = array_map(function ($i) { return $i . '.'; }, $chunks);
$result[] = $last;

or better:

$result = explode(' #&§', strtr($text, ['. '=>'. #&§']));

PHP preg_split keeping delimiter in a different element

This will get you pretty close

 $page_content = 'the quick brown fox [[random text here]] and then [[a different text here]]';

print_r(preg_split('/(\[\[[^\]]+\]\])/', $page_content, -1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY));

The thing to remember is that this is the delimiter (\[\[[^\]]+\]\])

Output:

Array
(
[0] => the quick brown fox
[1] => [[random text here]]
[2] => and then
[3] => [[a different text here]]
)

Sandbox

When i say pretty close, I do mean really pretty close...

The regex is pretty straight forward, capture 2 [ then anything but a ] then 2 of those ]. Which makes our delimiter, which we then capture. No empty flag is nice too.

Enjoy!

UPDATE

but it fails on " here is my table [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]] and this is more text"...Note the "[]" under the 'columns'

To handle that you will need a recursive regex pattern using (?R), like this:

$page_content = 'here is my table [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]] and this is more text [someother bracket]';

print_r(preg_split('/(\[(?:[^\[\]]|(?R))*\])/', $page_content, -1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY));

Output:

Array
(
[0] => here is my table
[1] => [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]]
[2] => and this is more text
[3] => [someother bracket] //single bracket capture
)

Sandbox

I won't pretend, this is kind of at the edge of my knowledge of regex, I should note this matches single brackets and not specifically double ones. You could try something like this /(\[(\[(?:[^\[\]]|(?2))*\])\])/ the (?2) is like (?R) but for a specific capture group. Which this works to match only [[ ... ]] while keeping the inner nesting. But the issue is, then you have the capture duplicated, so you wind up with this:

Array
(
[0] => here is my table
[1] => [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]]
[2] => [{"widget":"table","id":"1","title": "Views Table", "columns": []}]
[3] => and this is more text [someother bracket]
)

Notice how it doesn't capture [someother bracket], but it captures the other one 2 times. There may be a way around that, but i can't think of it.

Rather or not capturing single bracket pairs is an issue I don't know.

But I have used this before, mainly for matching, matched pairs of " or ( ) but it's the same concept.

The only other solution would be to make a lexer/parser for it, I have some examples of how do do that on my GitHub account. Regex (by itself) is not suited to nested elements. Most any regex solution will fail on nesting.

PHP preg_split delimiter pattern, split at character chain

If it can only be ,a, and ,a,,a,, then this should be enough:

preg_split("/(,a,)+/", $str);


Related Topics



Leave a reply



Submit