Regex Split String But Keep Separators

How to keep the delimiters of Regex.Split?

Just put the pattern into a capture-group, and the matches will also be included in the result.

string[] result = Regex.Split("123.456.789", @"(\.)");


{ "123", ".", "456", ".", "789" }

This also works for many other languages:

  • JavaScript: "123.456.789".split(/(\.)/g)
  • Python: re.split(r"(\.)", "123.456.789")
  • Perl: split(/(\.)/g, "123.456.789")

(Not Java though)

How to split string with Regex.Split and keep all separators?

You need a pattern with a lookahead only:


The \s+ will match 1 or more whitespaces (since your string contains whitespaces). In case there can be no whitespaces, use \s* (but then you will need to remove empty entries from the result). See the regex demo. If these delimiters must be whole words, use \b word boundaries: \s+(?=\b(?:delim1|delim2)\b).

In C#:

addrArr = Regex.Split(inputText, string.Format(@"\s+(?={0})", string.Join("|", delimeters)));

If the delimiters can contain special regex metacharacters, you will need to run Regex.Escape on your delimiters list.

A C# demo:

var inputText = "substring1 delim1 substring2 delim2 substr3";
var delimeters = new List<string> { "delim1", "delim2" };
var addrArr = Regex.Split(inputText,
string.Format(@"\s+(?={0})", string.Join("|", delimeters.Select(Regex.Escape))));
Console.WriteLine(string.Join("\n", addrArr));

Python RE library String Split but keep the delimiters/separators as part of the next string

If you are using python 3.7+ you can split by zero-length matches using re.split and positive lookahead:

string = 'a+0b-2a+b-b'
re.split(r'(?=[+-])', string)

# ['a', '+0b', '-2a', '+b', '-b']


Javascript and regex: split string and keep the separator

I was having similar but slight different problem. Anyway, here are examples of three different scenarios for where to keep the deliminator.

"1、2、3".split("、") == ["1", "2", "3"]
"1、2、3".split(/(、)/g) == ["1", "、", "2", "、", "3"]
"1、2、3".split(/(?=、)/g) == ["1", "、2", "、3"]
"1、2、3".split(/(?!、)/g) == ["1、", "2、", "3"]
"1、2、3".split(/(.*?、)/g) == ["", "1、", "", "2、", "3"]

Warning: The fourth will only work to split single characters. ConnorsFan presents an alternative:

// Split a path, but keep the slashes that follow directories
var str = 'Animation/rawr/javascript.js';
var tokens = str.match(/[^\/]+\/?|\//g);

Regex split string but keep separators

Use zero-length maching lookarounds; you want to split on


That is, anywhere where we assert a match of a literal [ ahead, or where we assert a match of literal ] behind.

As a C# string literal, this is


See also


Related questions

Example in Java

// prints "[abc, [s1], def, [s2], [s3], ghi]"

// prints "[abc;, def;, ghi;]"

// prints "[Oh, My, God]"

Dart - Split String with Regex and keep the delimiter

You're close. Try:

var re = RegExp(r'(?=<link=".*?">)|(?<=</link>)');

It has two differences from your RegExp:

  • It swaps the (?= and (?<= because you want a split before a <link...>, so you want a lookahead for that, and after a </link>, so a lookbehind for that.
  • I added the ? to ".*?", because otherwise it could potentially match until a later " on the same line, instead of the first one. Your example didn't have that, but better safe than sorry.

With that, you get the strings:

  1. "This is first text "
  2. "<link=\"\">First Hello</link>"
  3. "\nThis is the second text "
  4. "<link=\"\">Second</link>"
  5. "\n"

If you don't want the newlines to be included, you should probably remove them first.

if you want to combine the \n with the </link>, you can change the RegExp to

var re = RegExp(r'(?=<link=".*?">)|(?<=</link>\n*(?<=\n))');

That gives you:

  1. "This is first text "
  2. "<link=\"\">First Hello</link>\n"
  3. "This is the second text "
  4. "<link=\"\">Second</link>\n"

How to split string but keep delimiters in java?

As from your input string and expected results, I can infer that you want to split your string basically from three rules.

  • Split from the point which is preceded and followed by a colon
  • Split from the point which is preceded by a space and followed by a colon
  • Split from the point which is preceded by a colon and followed by a space

Hence you can use this regex using alternations for all three cases mentioned above.

(?<=:)(?=:)|(?<= )(?=:)|(?<=:)(?= )

Regex Demo

Java code,

String s = "Hello, :smile::hearth: world!";
System.out.println(Arrays.toString(s.split("(?<=:)(?=:)|(?<= )(?=:)|(?<=:)(?= )")));

Prints like your expected output,

[Hello, , :smile:, :hearth:,  world!]

Also, as an alternative if you can use matching the text rather than split, the regex would be much simpler to use and it would be this,


Regex Demo using match

Java code,

String s = "Hello, :smile::hearth: world!";
Pattern p = Pattern.compile(":[^:]+:|\\S+");
Matcher m = p.matcher(s);
while(m.find()) {



How to Split string but keep delimiter at the start

Instead of lookbehind you need to use a lookahead for splitting:


RegEx Demo

What you want is splitting on a position when you have comma at next position that makes it a lookahead assertion. On the other hand a lookbehind assertion will split when we have comma at previous position thus splitting after comma not before it.


String text = "1,2,3,4,5,6";
var split = Regex.Split(text, @"(?=,)");
//=> ["1", ",2", ",3", ",4", ",5", ",6"]

How to split a string with a regex of 2 delimiters, but keep the delimiters in Python?

Don't use re.split(), use re.findall() with a regexp that matches each sub-expression.

import re

s = "+2x-10+5"
result = re.findall(r'[-+]\w+', s)

How to split a string, but also keep the delimiters?

You can use lookahead and lookbehind, which are features of regular expressions.


And you will get:

[a;, b;, c;, d]
[a, ;b, ;c, ;d]
[a, ;, b, ;, c, ;, d]

The last one is what you want.

((?<=;)|(?=;)) equals to select an empty character before ; or after ;.

EDIT: Fabian Steeg's comments on readability is valid. Readability is always a problem with regular expressions. One thing I do to make regular expressions more readable is to create a variable, the name of which represents what the regular expression does. You can even put placeholders (e.g. %1$s) and use Java's String.format to replace the placeholders with the actual string you need to use; for example:

static public final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";

public void someMethod() {
final String[] aEach = "a;b;c;d".split(String.format(WITH_DELIMITER, ";"));

