Split String and Keep Delimiter in Sequence

Split a string with delimiters but keep the delimiters in the result in C#

If you want the delimiter to be its "own split", you can use Regex.Split e.g.:

string input = "plum-pear";
string pattern = "(-)";

string[] substrings = Regex.Split(input, pattern); // Split on hyphens
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
// 'plum'
// '-'
// 'pear'

So if you are looking for splitting a mathematical formula, you can use the following Regex

@"([*()\^\/]|(?<!E)[\+\-])" 

This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02

So:

Regex.Split("10E-02*x+sin(x)^2", @"([*()\^\/]|(?<!E)[\+\-])")

Yields:

  • 10E-02
  • *
  • x
  • +
  • sin
  • (
  • x
  • )
  • ^
  • 2

How to split a string, but also keep the delimiters?

You can use lookahead and lookbehind, which are features of regular expressions.

System.out.println(Arrays.toString("a;b;c;d".split("(?<=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("(?=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("((?<=;)|(?=;))")));

And you will get:

[a;, b;, c;, d]
[a, ;b, ;c, ;d]
[a, ;, b, ;, c, ;, d]

The last one is what you want.

((?<=;)|(?=;)) equals to select an empty character before ; or after ;.

EDIT: Fabian Steeg's comments on readability is valid. Readability is always a problem with regular expressions. One thing I do to make regular expressions more readable is to create a variable, the name of which represents what the regular expression does. You can even put placeholders (e.g. %1$s) and use Java's String.format to replace the placeholders with the actual string you need to use; for example:

static public final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";

public void someMethod() {
final String[] aEach = "a;b;c;d".split(String.format(WITH_DELIMITER, ";"));
...
}

split string with two delimiters and keep order and delimiter

I guess you could use preg_replace() to kind of "format" the string first, and insert some item delimiter that's not used elsewhere and so safe to use? In this case I'm using \t as the inserted delimiter.

$formatted_text = preg_replace('/ ?([-*]) /', "\t$1", $text);
$items_with_one_empty_in_front = explode("\t", $formatted_text);
var_dump($items_with_one_empty_in_front);

array(6) {
[0]=>
string(0) ""
[1]=>
string(4) "*aaa"
[2]=>
string(4) "-bbb"
[3]=>
string(4) "-ccc"
[4]=>
string(4) "*ddd"
[5]=>
string(4) "*eee"
}

You could then do like this:

foreach(array_slice($items_with_one_empty_in_front, 1) as $i => $item) {
if ($item[0] == '*') {
echo "$i - Negative: ".substr($item, 1);
}
else if ($item[0] == '-') {
echo "$i - Positive: ".substr($item, 1);
}
}

Version 2:

$parts = explode(" ", $text);
$opwords = [
'*' => 'Negative',
'-' => 'Positive'
];
$i = 1;
while($parts) {
$op = array_shift($parts);
$term = array_shift($parts);
echo $i++ . " - " . $opwords[$op] . ": ". $term . "\n";
}

Split String by Delimiter and Include Delimiter - Common Lisp

The problem is after the end condition of the do* loop. When variable i reaches the end of the string, the do* loop is exited but there is still a current-word which has not been added yet to words. When the end condition is met you need to add x to current-word and then current-word to words, before exiting the loop:

(defun split-string-with-delimiter (string delimiter)
"Splits a string into a list of strings, with the delimiter still
in the resulting list."
(let ((words nil)
(current-word (make-adjustable-string "")))
(do* ((i 0 (+ i 1))
(x (char string i) (char string i)))
((>= (+ i 1) (length string)) (progn (vector-push-extend x current-word) (push current-word words)))
(if (eql delimiter x)
(unless (string= "" current-word)
(push current-word words)
(push (string delimiter) words)
(setf current-word (make-adjustable-string "")))
(vector-push-extend x current-word)))
(nreverse words)))

However, note that this version is still buggy in that if the last character of string is a delimiter, this will be included into the last word, i.e. (split-string-with-delimiter "a.bc.def." #\.) => ("a" "." "bc" "." "def.")
I'll let you add this check.

In any case, you might want to make this more efficient by looking ahead for delimiter and extracting all the characters between the current i and the next delimiter at once as one single substring.

Split String into Array keeping delimiter/separator in Swift

Suppose you are splitting the string by a separator called separator, you can do the following:

let result = yourString.components(separatedBy:  separator) // first split
.flatMap { [$0, separator] } // add the separator after each split
.dropLast() // remove the last separator added
.filter { $0 != "" } // remove empty strings

For example:

let result = " Hello World ".components(separatedBy:  " ").flatMap { [$0, " "] }.dropLast().filter { $0 != "" }
print(result) // [" ", "Hello", " ", "World", " "]

How can I split a string in Java and retain the delimiters?

str.split("(?=[:;])")

This will give you the desired array, only with an empty first item. And:

str.split("(?=\\b[:;])")

This will give the array without the empty first item.

  • The key here is the (?=X) which is a zero-width positive lookahead (non-capturing construct) (see regex pattern docs).
  • [:;] means "either ; or :"
  • \b is word-boundary - it's there in order not to consider the first : as delimiter (since it is the beginning of the sequence)

Splitting string with character sequence as a delimiter

Faster than using String.split is Pattern.split: i.e., precompile the pattern and store that for subsequent use. If you use the same pattern all the time, and do a lot of splitting using that pattern, it may be worth putting that pattern into a static field or something.

Also, if your pattern contains no regex metacharacters, you can pass in Pattern.LITERAL when creating the pattern. This is something you can't do with String.split. :-P

string.split but keeping sequential matches

Note: This answer only addresses the parts of the question preceding its "update" paragraph, because it was written before the question was edited.

string.Split will produce n-1 empty parts for n consecutive separator characters. Since you want it to produce n empty parts instead, you are one tilde short wherever several of them occur consequtively. Add the "missing" tildes as follows before you perform the Split:

// using System.Text.RegularExpressions;

const string input = "~abc~~~123~~~hijkl~9";
string[] parts = Regex.Replace(input, "~~+", "$0~").Split('~');
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


Related Topics



Leave a reply



Submit