Split a String with Delimiters But Keep the Delimiters in the Result in C#

Split a string with delimiters but keep the delimiters in the result in C#

If you want the delimiter to be its "own split", you can use Regex.Split e.g.:

string input = "plum-pear";
string pattern = "(-)";

string[] substrings = Regex.Split(input, pattern); // Split on hyphens
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
// 'plum'
// '-'
// 'pear'

So if you are looking for splitting a mathematical formula, you can use the following Regex

@"([*()\^\/]|(?<!E)[\+\-])" 

This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02

So:

Regex.Split("10E-02*x+sin(x)^2", @"([*()\^\/]|(?<!E)[\+\-])")

Yields:

  • 10E-02
  • *
  • x
  • +
  • sin
  • (
  • x
  • )
  • ^
  • 2

How to Split string but keep delimiter at the start

Instead of lookbehind you need to use a lookahead for splitting:

(?=,)

RegEx Demo

What you want is splitting on a position when you have comma at next position that makes it a lookahead assertion. On the other hand a lookbehind assertion will split when we have comma at previous position thus splitting after comma not before it.

Code:

String text = "1,2,3,4,5,6";
var split = Regex.Split(text, @"(?=,)");
//=> ["1", ",2", ",3", ",4", ",5", ",6"]

Split a string on multiple delimiters and keep them in the output

Try this,

private char[] alphabets = {'A','B','C', 'D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'};    

var input = "AB123456789C123412341234B123";
var result = input.SplitAndKeep(alphabets).ToList();

Sample Image

public static class Extensions
{
public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
int start = 0, index;
while ((index = s.IndexOfAny(delims, start)) != -1)
{
if (index - start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, 1);
start = index + 1;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}
}

Can you split a string and keep the split char(s)?

You can use Regex.Split with a pattern that doesn't consume delimiter characters:

var pattern = @"(?=\+)";

var ans = Regex.Split(src, pattern);

This will create an empty entry if there is a leading + as there is an implied split before the +.

You could use LINQ to remove the empty entries if they aren't wanted:

var ans2 = Regex.Split(src, pattern).Where(s => !String.IsNullOrEmpty(s)).ToArray();

Alternatively, you could use Regex.Matches to extract the full matching patterns:

var ans3 = Regex.Matches(src, @"\+[^+]*").Cast<Match>().Select(m => m.Value).ToArray();

How to split a string and keep the delimiters?

Simple, replace them first. I'll use the "|" for readability but you may want to use something more exotic.

// this part could be made a little smarter and more flexible.    
// So, just the basic idea:
Text = Text.Replace(". ", ". |").Replace("? ", "? |").Replace("! ", "! |");

if (Text.Contains("|"))
return Text.Split('|', StringSplitOptions.RemoveEmptyEntries);

And I wonder about the else return new string[0];, that seems odd. Assuming that when there are no delimiters you want the return the input string, you should just remove the if/else construct.

C# split large string with Regex.Split. Must keep delimiters

You can use

var text = "Artículo 1. This is a test that includes : 1) Sample text 2) Sample text";
var result = Regex.Split(text, @"(?!^)\s+(?=\bArtículo\s+[0-9]+\.|[a-z]\)|[1-9]\d?\)|\bPárrafo\b)", RegexOptions.None);
Console.WriteLine(string.Join("\n", result));
// => Artículo 1. This is a test that includes :
// => 1) Sample text
// => 2) Sample text

See the C# demo and the regex demo.

The regex is

(?!^)\s+(?=\bArtículo\s+[0-9]+\.|[a-z]\)|[1-9]\d?\)|\bPárrafo\b)

It matches

  • (?!^) - a location other than start of string
  • \s+ - 1+ whitespaces (if you use \s*, you will need to add .Where(x => !string.IsNullOrEmpty(x)) after the Regex.Split call)
  • (?=\bArtículo\s+[0-9]+\.|[a-z]\)|[1-9]\d?\)|\bPárrafo\b) - a location that is immediately followed with
    • \bArtículo\s+[0-9]+\.| - whole word Artículo, 1+ whitespaces, 1+ ASCII digits, and a ., or
    • [a-z]\)| - a lowercase ASCII letter and ), or
    • [1-9]\d?\)| - a non-zero digit, then an optional digit and a ), or
    • \bPárrafo\b - a whole word Párrafo.

How to split string with Regex.Split and keep all separators?

You need a pattern with a lookahead only:

\s+(?=delim1|delim2)

The \s+ will match 1 or more whitespaces (since your string contains whitespaces). In case there can be no whitespaces, use \s* (but then you will need to remove empty entries from the result). See the regex demo. If these delimiters must be whole words, use \b word boundaries: \s+(?=\b(?:delim1|delim2)\b).

In C#:

addrArr = Regex.Split(inputText, string.Format(@"\s+(?={0})", string.Join("|", delimeters)));

If the delimiters can contain special regex metacharacters, you will need to run Regex.Escape on your delimiters list.

A C# demo:

var inputText = "substring1 delim1 substring2 delim2 substr3";
var delimeters = new List<string> { "delim1", "delim2" };
var addrArr = Regex.Split(inputText,
string.Format(@"\s+(?={0})", string.Join("|", delimeters.Select(Regex.Escape))));
Console.WriteLine(string.Join("\n", addrArr));

how can i split a string by multiple delimiters and keep the delimiters?

Use Regex.Split with a pattern enclosed with a capturing group.

If capturing parentheses are used in a Regex.Split expression, any captured text is included in the resulting string array.

See the C# demo:

var s = "abc({";
var results = Regex.Split(s, @"(\()")
.Where(m=>!string.IsNullOrEmpty(m))
.ToList();
Console.WriteLine(string.Join(", ", results));
// => abc, (, {

The (\() regex matches and captures ( symbol into Capturing group 1, and thus the captured part is also output in the resulting string list.

How to keep the delimiters of Regex.Split?

Just put the pattern into a capture-group, and the matches will also be included in the result.

string[] result = Regex.Split("123.456.789", @"(\.)");

Result:

{ "123", ".", "456", ".", "789" }

This also works for many other languages:

  • JavaScript: "123.456.789".split(/(\.)/g)
  • Python: re.split(r"(\.)", "123.456.789")
  • Perl: split(/(\.)/g, "123.456.789")

(Not Java though)



Related Topics



Leave a reply



Submit