How to keep the delimiters of Regex.Split?
Just put the pattern into a capture-group, and the matches will also be included in the result.
string[] result = Regex.Split("123.456.789", @"(\.)");
Result:
{ "123", ".", "456", ".", "789" }
This also works for many other languages:
- JavaScript:
"123.456.789".split(/(\.)/g)
- Python:
re.split(r"(\.)", "123.456.789")
- Perl:
split(/(\.)/g, "123.456.789")
(Not Java though)
In Python, how do I split a string and keep the separators?
>>> re.split('(\W)', 'foo/bar spam\neggs')
['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']
Dart - Split String with Regex and keep the delimiter
You're close. Try:
var re = RegExp(r'(?=<link=".*?">)|(?<=</link>)');
It has two differences from your RegExp:
- It swaps the
(?=
and(?<=
because you want a split before a<link...>
, so you want a lookahead for that, and after a</link>
, so a lookbehind for that. - I added the
?
to".*?"
, because otherwise it could potentially match until a later"
on the same line, instead of the first one. Your example didn't have that, but better safe than sorry.
With that, you get the strings:
"This is first text "
"<link=\"www.stackoverflow.com\">First Hello</link>"
"\nThis is the second text "
"<link=\"www.stackoverflow.com\">Second</link>"
"\n"
If you don't want the newlines to be included, you should probably remove them first.
if you want to combine the \n
with the </link>
, you can change the RegExp to
var re = RegExp(r'(?=<link=".*?">)|(?<=</link>\n*(?<=\n))');
That gives you:
"This is first text "
"<link=\"www.stackoverflow.com\">First Hello</link>\n"
"This is the second text "
"<link=\"www.stackoverflow.com\">Second</link>\n"
C# split large string with Regex.Split. Must keep delimiters
You can use
var text = "Artículo 1. This is a test that includes : 1) Sample text 2) Sample text";
var result = Regex.Split(text, @"(?!^)\s+(?=\bArtículo\s+[0-9]+\.|[a-z]\)|[1-9]\d?\)|\bPárrafo\b)", RegexOptions.None);
Console.WriteLine(string.Join("\n", result));
// => Artículo 1. This is a test that includes :
// => 1) Sample text
// => 2) Sample text
See the C# demo and the regex demo.
The regex is
(?!^)\s+(?=\bArtículo\s+[0-9]+\.|[a-z]\)|[1-9]\d?\)|\bPárrafo\b)
It matches
(?!^)
- a location other than start of string\s+
- 1+ whitespaces (if you use\s*
, you will need to add.Where(x => !string.IsNullOrEmpty(x))
after theRegex.Split
call)(?=\bArtículo\s+[0-9]+\.|[a-z]\)|[1-9]\d?\)|\bPárrafo\b)
- a location that is immediately followed with\bArtículo\s+[0-9]+\.|
- whole wordArtículo
, 1+ whitespaces, 1+ ASCII digits, and a.
, or[a-z]\)|
- a lowercase ASCII letter and)
, or[1-9]\d?\)|
- a non-zero digit, then an optional digit and a)
, or\bPárrafo\b
- a whole wordPárrafo
.
How to split string with Regex.Split and keep all separators?
You need a pattern with a lookahead only:
\s+(?=delim1|delim2)
The \s+
will match 1 or more whitespaces (since your string contains whitespaces). In case there can be no whitespaces, use \s*
(but then you will need to remove empty entries from the result). See the regex demo. If these delimiters must be whole words, use \b
word boundaries: \s+(?=\b(?:delim1|delim2)\b)
.
In C#:
addrArr = Regex.Split(inputText, string.Format(@"\s+(?={0})", string.Join("|", delimeters)));
If the delimiters can contain special regex metacharacters, you will need to run Regex.Escape
on your delimiters
list.
A C# demo:
var inputText = "substring1 delim1 substring2 delim2 substr3";
var delimeters = new List<string> { "delim1", "delim2" };
var addrArr = Regex.Split(inputText,
string.Format(@"\s+(?={0})", string.Join("|", delimeters.Select(Regex.Escape))));
Console.WriteLine(string.Join("\n", addrArr));
How to split a string, but also keep the delimiters?
You can use lookahead and lookbehind, which are features of regular expressions.
System.out.println(Arrays.toString("a;b;c;d".split("(?<=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("(?=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("((?<=;)|(?=;))")));
And you will get:
[a;, b;, c;, d]
[a, ;b, ;c, ;d]
[a, ;, b, ;, c, ;, d]
The last one is what you want.
((?<=;)|(?=;))
equals to select an empty character before ;
or after ;
.
EDIT: Fabian Steeg's comments on readability is valid. Readability is always a problem with regular expressions. One thing I do to make regular expressions more readable is to create a variable, the name of which represents what the regular expression does. You can even put placeholders (e.g. %1$s
) and use Java's String.format
to replace the placeholders with the actual string you need to use; for example:
static public final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";
public void someMethod() {
final String[] aEach = "a;b;c;d".split(String.format(WITH_DELIMITER, ";"));
...
}
Javascript and regex: split string and keep the separator
I was having similar but slight different problem. Anyway, here are examples of three different scenarios for where to keep the deliminator.
"1、2、3".split("、") == ["1", "2", "3"]
"1、2、3".split(/(、)/g) == ["1", "、", "2", "、", "3"]
"1、2、3".split(/(?=、)/g) == ["1", "、2", "、3"]
"1、2、3".split(/(?!、)/g) == ["1、", "2、", "3"]
"1、2、3".split(/(.*?、)/g) == ["", "1、", "", "2、", "3"]
Warning: The fourth will only work to split single characters. ConnorsFan presents an alternative:
// Split a path, but keep the slashes that follow directories
var str = 'Animation/rawr/javascript.js';
var tokens = str.match(/[^\/]+\/?|\//g);
How to split string but keep delimiters in java?
As from your input string and expected results, I can infer that you want to split your string basically from three rules.
- Split from the point which is preceded and followed by a colon
- Split from the point which is preceded by a space and followed by a colon
- Split from the point which is preceded by a colon and followed by a space
Hence you can use this regex using alternations for all three cases mentioned above.
(?<=:)(?=:)|(?<= )(?=:)|(?<=:)(?= )
Regex Demo
Java code,
String s = "Hello, :smile::hearth: world!";
System.out.println(Arrays.toString(s.split("(?<=:)(?=:)|(?<= )(?=:)|(?<=:)(?= )")));
Prints like your expected output,
[Hello, , :smile:, :hearth:, world!]
Also, as an alternative if you can use matching the text rather than split, the regex would be much simpler to use and it would be this,
:[^:]+:|\S+
Regex Demo using match
Java code,
String s = "Hello, :smile::hearth: world!";
Pattern p = Pattern.compile(":[^:]+:|\\S+");
Matcher m = p.matcher(s);
while(m.find()) {
System.out.println(m.group());
}
Prints,
Hello,
:smile:
:hearth:
world!
How to split a string with a regex of 2 delimiters, but keep the delimiters in Python?
Don't use re.split()
, use re.findall()
with a regexp that matches each sub-expression.
import re
s = "+2x-10+5"
result = re.findall(r'[-+]\w+', s)
print(result)
Related Topics
What Is the Connection String for Localdb for Version 11
Ef Code-First One-To-One Relationship: Multiplicity Is Not Valid in Role * in Relationship
How to Configure Swashbuckle to Ignore Property on Model
Data Binding in Wpf User Controls
Performance Tests of Serializations Used by Wcf Bindings
Why Can't Datetime.Parse Parse Utc Date
Best Way to Tackle Global Hotkey Processing in C#
How to Get the "Friendly" Os Version Name
How to Sort a Two-Dimensional (Rectangular) Array in C#
C# Covariant Return Types Utilizing Generics
How to Detect Keypress While Not Focused
MVC Which Submit Button Has Been Pressed
How to Convert an Object to a Byte Array in C#
C# - Code to Order by a Property Using the Property Name as a String