How to keep the delimiters of Regex.Split?
Just put the pattern into a capture-group, and the matches will also be included in the result.
string[] result = Regex.Split("123.456.789", @"(\.)");
Result:
{ "123", ".", "456", ".", "789" }
This also works for many other languages:
- JavaScript:
"123.456.789".split(/(\.)/g)
- Python:
re.split(r"(\.)", "123.456.789")
- Perl:
split(/(\.)/g, "123.456.789")
(Not Java though)
How to split string with Regex.Split and keep all separators?
You need a pattern with a lookahead only:
\s+(?=delim1|delim2)
The \s+
will match 1 or more whitespaces (since your string contains whitespaces). In case there can be no whitespaces, use \s*
(but then you will need to remove empty entries from the result). See the regex demo. If these delimiters must be whole words, use \b
word boundaries: \s+(?=\b(?:delim1|delim2)\b)
.
In C#:
addrArr = Regex.Split(inputText, string.Format(@"\s+(?={0})", string.Join("|", delimeters)));
If the delimiters can contain special regex metacharacters, you will need to run Regex.Escape
on your delimiters
list.
A C# demo:
var inputText = "substring1 delim1 substring2 delim2 substr3";
var delimeters = new List<string> { "delim1", "delim2" };
var addrArr = Regex.Split(inputText,
string.Format(@"\s+(?={0})", string.Join("|", delimeters.Select(Regex.Escape))));
Console.WriteLine(string.Join("\n", addrArr));
Python RE library String Split but keep the delimiters/separators as part of the next string
If you are using python 3.7+ you can split by zero-length matches using re.split
and positive lookahead:
string = 'a+0b-2a+b-b'
re.split(r'(?=[+-])', string)
# ['a', '+0b', '-2a', '+b', '-b']
Demo: https://regex101.com/r/AB6UBa/1
Javascript and regex: split string and keep the separator
I was having similar but slight different problem. Anyway, here are examples of three different scenarios for where to keep the deliminator.
"1、2、3".split("、") == ["1", "2", "3"]
"1、2、3".split(/(、)/g) == ["1", "、", "2", "、", "3"]
"1、2、3".split(/(?=、)/g) == ["1", "、2", "、3"]
"1、2、3".split(/(?!、)/g) == ["1、", "2、", "3"]
"1、2、3".split(/(.*?、)/g) == ["", "1、", "", "2、", "3"]
Warning: The fourth will only work to split single characters. ConnorsFan presents an alternative:
// Split a path, but keep the slashes that follow directories
var str = 'Animation/rawr/javascript.js';
var tokens = str.match(/[^\/]+\/?|\//g);
Regex split string but keep separators
Use zero-length maching lookarounds; you want to split on
(?=\[)|(?<=\])
That is, anywhere where we assert a match of a literal [
ahead, or where we assert a match of literal ]
behind.
As a C# string literal, this is
@"(?=\[)|(?<=\])"
See also
- regular-expressions.info/Lookarounds
Related questions
- Java split is eating my characters. -- has many examples
Example in Java
System.out.println(java.util.Arrays.toString(
"abc[s1]def[s2][s3]ghi".split("(?=\\[)|(?<=\\])")
));
// prints "[abc, [s1], def, [s2], [s3], ghi]"
System.out.println(java.util.Arrays.toString(
"abc;def;ghi;".split("(?<=;)")
));
// prints "[abc;, def;, ghi;]"
System.out.println(java.util.Arrays.toString(
"OhMyGod".split("(?=(?!^)[A-Z])")
));
// prints "[Oh, My, God]"
Dart - Split String with Regex and keep the delimiter
You're close. Try:
var re = RegExp(r'(?=<link=".*?">)|(?<=</link>)');
It has two differences from your RegExp:
- It swaps the
(?=
and(?<=
because you want a split before a<link...>
, so you want a lookahead for that, and after a</link>
, so a lookbehind for that. - I added the
?
to".*?"
, because otherwise it could potentially match until a later"
on the same line, instead of the first one. Your example didn't have that, but better safe than sorry.
With that, you get the strings:
"This is first text "
"<link=\"www.stackoverflow.com\">First Hello</link>"
"\nThis is the second text "
"<link=\"www.stackoverflow.com\">Second</link>"
"\n"
If you don't want the newlines to be included, you should probably remove them first.
if you want to combine the \n
with the </link>
, you can change the RegExp to
var re = RegExp(r'(?=<link=".*?">)|(?<=</link>\n*(?<=\n))');
That gives you:
"This is first text "
"<link=\"www.stackoverflow.com\">First Hello</link>\n"
"This is the second text "
"<link=\"www.stackoverflow.com\">Second</link>\n"
How to split string but keep delimiters in java?
As from your input string and expected results, I can infer that you want to split your string basically from three rules.
- Split from the point which is preceded and followed by a colon
- Split from the point which is preceded by a space and followed by a colon
- Split from the point which is preceded by a colon and followed by a space
Hence you can use this regex using alternations for all three cases mentioned above.
(?<=:)(?=:)|(?<= )(?=:)|(?<=:)(?= )
Regex Demo
Java code,
String s = "Hello, :smile::hearth: world!";
System.out.println(Arrays.toString(s.split("(?<=:)(?=:)|(?<= )(?=:)|(?<=:)(?= )")));
Prints like your expected output,
[Hello, , :smile:, :hearth:, world!]
Also, as an alternative if you can use matching the text rather than split, the regex would be much simpler to use and it would be this,
:[^:]+:|\S+
Regex Demo using match
Java code,
String s = "Hello, :smile::hearth: world!";
Pattern p = Pattern.compile(":[^:]+:|\\S+");
Matcher m = p.matcher(s);
while(m.find()) {
System.out.println(m.group());
}
Prints,
Hello,
:smile:
:hearth:
world!
How to Split string but keep delimiter at the start
Instead of lookbehind you need to use a lookahead for splitting:
(?=,)
RegEx Demo
What you want is splitting on a position when you have comma at next position that makes it a lookahead assertion. On the other hand a lookbehind assertion will split when we have comma at previous position thus splitting after comma not before it.
Code:
String text = "1,2,3,4,5,6";
var split = Regex.Split(text, @"(?=,)");
//=> ["1", ",2", ",3", ",4", ",5", ",6"]
How to split a string with a regex of 2 delimiters, but keep the delimiters in Python?
Don't use re.split()
, use re.findall()
with a regexp that matches each sub-expression.
import re
s = "+2x-10+5"
result = re.findall(r'[-+]\w+', s)
print(result)
How to split a string, but also keep the delimiters?
You can use lookahead and lookbehind, which are features of regular expressions.
System.out.println(Arrays.toString("a;b;c;d".split("(?<=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("(?=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("((?<=;)|(?=;))")));
And you will get:
[a;, b;, c;, d]
[a, ;b, ;c, ;d]
[a, ;, b, ;, c, ;, d]
The last one is what you want.
((?<=;)|(?=;))
equals to select an empty character before ;
or after ;
.
EDIT: Fabian Steeg's comments on readability is valid. Readability is always a problem with regular expressions. One thing I do to make regular expressions more readable is to create a variable, the name of which represents what the regular expression does. You can even put placeholders (e.g. %1$s
) and use Java's String.format
to replace the placeholders with the actual string you need to use; for example:
static public final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";
public void someMethod() {
final String[] aEach = "a;b;c;d".split(String.format(WITH_DELIMITER, ";"));
...
}
Related Topics
Could Not Load File or Assembly 'System.Web.Http 4.0.0 After Update from 2012 to 2013
Case Insensitive Access for Generic Dictionary
Dependency Injection VS Service Location
Create Object Instance Without Invoking Constructor
ASP.NET MVC 3: Override "Name" Attribute with Textboxfor
Enumerating Through an Object's Properties (String) in C#
Compile-Time and Runtime Casting C#
When Would You Use Delegates in C#
This Row Already Belongs to Another Table Error When Trying to Add Rows
Having the Output of a Console Application in Visual Studio Instead of the Console
In C#, How to Cast a List<Child> to List<Parent>
How to Put a Usercontrol into Visual Studio Toolbox