Split a string with delimiters but keep the delimiters in the result in C#
If you want the delimiter to be its "own split", you can use Regex.Split e.g.:
string input = "plum-pear";
string pattern = "(-)";
string[] substrings = Regex.Split(input, pattern); // Split on hyphens
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
// 'plum'
// '-'
// 'pear'
So if you are looking for splitting a mathematical formula, you can use the following Regex
@"([*()\^\/]|(?<!E)[\+\-])"
This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02
So:
Regex.Split("10E-02*x+sin(x)^2", @"([*()\^\/]|(?<!E)[\+\-])")
Yields:
10E-02
*
x
+
sin
(
x
)
^
2
How to Split string but keep delimiter at the start
Instead of lookbehind you need to use a lookahead for splitting:
(?=,)
RegEx Demo
What you want is splitting on a position when you have comma at next position that makes it a lookahead assertion. On the other hand a lookbehind assertion will split when we have comma at previous position thus splitting after comma not before it.
Code:
String text = "1,2,3,4,5,6";
var split = Regex.Split(text, @"(?=,)");
//=> ["1", ",2", ",3", ",4", ",5", ",6"]
Split a string on multiple delimiters and keep them in the output
Try this,
private char[] alphabets = {'A','B','C', 'D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'};
var input = "AB123456789C123412341234B123";
var result = input.SplitAndKeep(alphabets).ToList();
public static class Extensions
{
public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
int start = 0, index;
while ((index = s.IndexOfAny(delims, start)) != -1)
{
if (index - start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, 1);
start = index + 1;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}
}
Can you split a string and keep the split char(s)?
You can use Regex.Split
with a pattern that doesn't consume delimiter characters:
var pattern = @"(?=\+)";
var ans = Regex.Split(src, pattern);
This will create an empty entry if there is a leading +
as there is an implied split before the +
.
You could use LINQ to remove the empty entries if they aren't wanted:
var ans2 = Regex.Split(src, pattern).Where(s => !String.IsNullOrEmpty(s)).ToArray();
Alternatively, you could use Regex.Matches
to extract the full matching patterns:
var ans3 = Regex.Matches(src, @"\+[^+]*").Cast<Match>().Select(m => m.Value).ToArray();
How to split a string and keep the delimiters?
Simple, replace them first. I'll use the "|"
for readability but you may want to use something more exotic.
// this part could be made a little smarter and more flexible.
// So, just the basic idea:
Text = Text.Replace(". ", ". |").Replace("? ", "? |").Replace("! ", "! |");
if (Text.Contains("|"))
return Text.Split('|', StringSplitOptions.RemoveEmptyEntries);
And I wonder about the else return new string[0];
, that seems odd. Assuming that when there are no delimiters you want the return the input string, you should just remove the if/else
construct.
C# split large string with Regex.Split. Must keep delimiters
You can use
var text = "Artículo 1. This is a test that includes : 1) Sample text 2) Sample text";
var result = Regex.Split(text, @"(?!^)\s+(?=\bArtículo\s+[0-9]+\.|[a-z]\)|[1-9]\d?\)|\bPárrafo\b)", RegexOptions.None);
Console.WriteLine(string.Join("\n", result));
// => Artículo 1. This is a test that includes :
// => 1) Sample text
// => 2) Sample text
See the C# demo and the regex demo.
The regex is
(?!^)\s+(?=\bArtículo\s+[0-9]+\.|[a-z]\)|[1-9]\d?\)|\bPárrafo\b)
It matches
(?!^)
- a location other than start of string\s+
- 1+ whitespaces (if you use\s*
, you will need to add.Where(x => !string.IsNullOrEmpty(x))
after theRegex.Split
call)(?=\bArtículo\s+[0-9]+\.|[a-z]\)|[1-9]\d?\)|\bPárrafo\b)
- a location that is immediately followed with\bArtículo\s+[0-9]+\.|
- whole wordArtículo
, 1+ whitespaces, 1+ ASCII digits, and a.
, or[a-z]\)|
- a lowercase ASCII letter and)
, or[1-9]\d?\)|
- a non-zero digit, then an optional digit and a)
, or\bPárrafo\b
- a whole wordPárrafo
.
How to split string with Regex.Split and keep all separators?
You need a pattern with a lookahead only:
\s+(?=delim1|delim2)
The \s+
will match 1 or more whitespaces (since your string contains whitespaces). In case there can be no whitespaces, use \s*
(but then you will need to remove empty entries from the result). See the regex demo. If these delimiters must be whole words, use \b
word boundaries: \s+(?=\b(?:delim1|delim2)\b)
.
In C#:
addrArr = Regex.Split(inputText, string.Format(@"\s+(?={0})", string.Join("|", delimeters)));
If the delimiters can contain special regex metacharacters, you will need to run Regex.Escape
on your delimiters
list.
A C# demo:
var inputText = "substring1 delim1 substring2 delim2 substr3";
var delimeters = new List<string> { "delim1", "delim2" };
var addrArr = Regex.Split(inputText,
string.Format(@"\s+(?={0})", string.Join("|", delimeters.Select(Regex.Escape))));
Console.WriteLine(string.Join("\n", addrArr));
how can i split a string by multiple delimiters and keep the delimiters?
Use Regex.Split
with a pattern enclosed with a capturing group.
If capturing parentheses are used in a
Regex.Split
expression, any captured text is included in the resulting string array.
See the C# demo:
var s = "abc({";
var results = Regex.Split(s, @"(\()")
.Where(m=>!string.IsNullOrEmpty(m))
.ToList();
Console.WriteLine(string.Join(", ", results));
// => abc, (, {
The (\()
regex matches and captures (
symbol into Capturing group 1, and thus the captured part is also output in the resulting string list.
How to keep the delimiters of Regex.Split?
Just put the pattern into a capture-group, and the matches will also be included in the result.
string[] result = Regex.Split("123.456.789", @"(\.)");
Result:
{ "123", ".", "456", ".", "789" }
This also works for many other languages:
- JavaScript:
"123.456.789".split(/(\.)/g)
- Python:
re.split(r"(\.)", "123.456.789")
- Perl:
split(/(\.)/g, "123.456.789")
(Not Java though)
Related Topics
Difference Between Equals/Equals and == Operator
Most Elegant Way to Generate Prime Numbers
Show a Form Without Stealing Focus
Run Two Async Tasks in Parallel and Collect Results in .Net 4.5
Capture Stored Procedure Print Output in .Net
How to Assign a Base Class Object to a Derived Class Reference with an Explicit Typecast
How to Deserialize JSON into Ienumerable<Basetype> with Newtonsoft JSON.Net
C# Okay with Comparing Value Types to Null
Conditional Operator Assignment with Nullable<Value> Types
ASP.NET Core Web API Exception Handling
How Is Math.Pow() Implemented in .Net Framework
How to Serialize a Dictionary as Part of Its Parent Object Using JSON.Net
Order of Linq Extension Methods Does Not Affect Performance
Passing Values Between Windows Forms C#
Reading PDF Content Using Itextsharp in C#
Best Practices for Exception Management in Java or C#