How to Split a String, But Also Keep the Delimiters

How to split a string, but also keep the delimiters?

You can use lookahead and lookbehind, which are features of regular expressions.

System.out.println(Arrays.toString("a;b;c;d".split("(?<=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("(?=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("((?<=;)|(?=;))")));

And you will get:

[a;, b;, c;, d]
[a, ;b, ;c, ;d]
[a, ;, b, ;, c, ;, d]

The last one is what you want.

((?<=;)|(?=;)) equals to select an empty character before ; or after ;.

EDIT: Fabian Steeg's comments on readability is valid. Readability is always a problem with regular expressions. One thing I do to make regular expressions more readable is to create a variable, the name of which represents what the regular expression does. You can even put placeholders (e.g. %1$s) and use Java's String.format to replace the placeholders with the actual string you need to use; for example:

static public final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";

public void someMethod() {
final String[] aEach = "a;b;c;d".split(String.format(WITH_DELIMITER, ";"));
...
}

How to split string but keep delimiters in java?

As from your input string and expected results, I can infer that you want to split your string basically from three rules.

  • Split from the point which is preceded and followed by a colon
  • Split from the point which is preceded by a space and followed by a colon
  • Split from the point which is preceded by a colon and followed by a space

Hence you can use this regex using alternations for all three cases mentioned above.

(?<=:)(?=:)|(?<= )(?=:)|(?<=:)(?= )

Regex Demo

Java code,

String s = "Hello, :smile::hearth: world!";
System.out.println(Arrays.toString(s.split("(?<=:)(?=:)|(?<= )(?=:)|(?<=:)(?= )")));

Prints like your expected output,

[Hello, , :smile:, :hearth:,  world!]

Also, as an alternative if you can use matching the text rather than split, the regex would be much simpler to use and it would be this,

:[^:]+:|\S+

Regex Demo using match

Java code,

String s = "Hello, :smile::hearth: world!";
Pattern p = Pattern.compile(":[^:]+:|\\S+");
Matcher m = p.matcher(s);
while(m.find()) {
System.out.println(m.group());
}

Prints,

Hello,
:smile:
:hearth:
world!

How to split a String and keep the delimiter '='

You can use this pattern...

 s = "p=YSp%hZ5=YunnYDUuGxVxAeLCZuVvSfoutO8=";array = s.split(/(?<=\=)/);console.log(array);

Splitting on multiple delimiters but keep the delimiters on the same string

Use the more powerful Matcher functionality instead of String.split. The below code should work, but has not been optimized:

Pattern pattern = Pattern.compile("\\d*(\\$|£)");

String input = "1£23$456$£$";
Matcher matcher = pattern.matcher(input);
List<String> output = new ArrayList<>();
while (matcher.find()) {
output.add(matcher.group());
}

Printing out output.toString() generates:

[1£, 23$, 456$, £, $]


Updated requirements:

  1. Also include delimiter characters: +, -, *, and /
  2. Non-delimiter characters are only digits with optional spaces before the delimiters.
  3. Any such spaces are part of the value, not delimiters themselves.

Use the regular expression: \\d*\\s*[-\\+\\*/\\$£]

That pattern, with this given input:

1£23$456$£$7+89-1011*121314/1 £23 $456 $ £ $7 +89 -1011 * 121314 /

Will generate this output:

[1£, 23$, 456$, £, $, 7+, 89-, 1011*, 121314/, 1 £, 23 $, 456 $, £, $, 7 +, 89 -, 1011 *, 121314 /]

Dart - Split String with Regex and keep the delimiter

You're close. Try:

var re = RegExp(r'(?=<link=".*?">)|(?<=</link>)');

It has two differences from your RegExp:

  • It swaps the (?= and (?<= because you want a split before a <link...>, so you want a lookahead for that, and after a </link>, so a lookbehind for that.
  • I added the ? to ".*?", because otherwise it could potentially match until a later " on the same line, instead of the first one. Your example didn't have that, but better safe than sorry.

With that, you get the strings:

  1. "This is first text "
  2. "<link=\"www.stackoverflow.com\">First Hello</link>"
  3. "\nThis is the second text "
  4. "<link=\"www.stackoverflow.com\">Second</link>"
  5. "\n"

If you don't want the newlines to be included, you should probably remove them first.

if you want to combine the \n with the </link>, you can change the RegExp to

var re = RegExp(r'(?=<link=".*?">)|(?<=</link>\n*(?<=\n))');

That gives you:

  1. "This is first text "
  2. "<link=\"www.stackoverflow.com\">First Hello</link>\n"
  3. "This is the second text "
  4. "<link=\"www.stackoverflow.com\">Second</link>\n"

In Red language, how to split a string using split, but also keep the delimiters as nessecary

^(60) is a so-called codepoint form that gets loaded as a ` character.

>> "^(60)"
== "`"

If you want to avoid that, you should either escape it manually:

>> {1 + 3 `to-string #"^^(60)"` c}
== {1 + 3 `to-string #"^^(60)"` c}

Or use raw strings:

>> %{1 + 3 `to-string #"^(60)"` c}%
== {1 + 3 `to-string #"^^(60)"` c}

splitting it afterwards is trivial:

>> split %{1 + 3 `to-string #"^(60)"` c}% #"`"
== ["1 + 3 " {to-string #"^^(60)"} " c"]

In case you want to keep ` character there, then split won't cut it. You need something like Parse:

>> string: {1 + 3 `to-string #"`"` c}
== {1 + 3 `to-string #"`"` c}
>> parse string [collect [keep to " `" " `" keep to "` " "` " keep copy match to end]]
== ["1 + 3" {to-string #"`"} "c"]
>> parse string [collect some [keep copy _ to copy match [" `" | "` " | end] match]]
== ["1 + 3" {to-string #"`"} "c"]

Split a string with delimiters but keep the delimiters in the result in C#

If you want the delimiter to be its "own split", you can use Regex.Split e.g.:

string input = "plum-pear";
string pattern = "(-)";

string[] substrings = Regex.Split(input, pattern); // Split on hyphens
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
// The method writes the following to the console:
// 'plum'
// '-'
// 'pear'

So if you are looking for splitting a mathematical formula, you can use the following Regex

@"([*()\^\/]|(?<!E)[\+\-])" 

This will ensure you can also use constants like 1E-02 and avoid having them split into 1E, - and 02

So:

Regex.Split("10E-02*x+sin(x)^2", @"([*()\^\/]|(?<!E)[\+\-])")

Yields:

  • 10E-02
  • *
  • x
  • +
  • sin
  • (
  • x
  • )
  • ^
  • 2

Split string by repeatable delimiter in Java/Kotlin

Actually, it is not correct to use + quantifier inside a lookbehind in Java's regex patterns, this is not a documented supported feature. It does not throw exception because internally it is translated into {1,0x7FFFFFFF} and Java's regex supports constrained-width lookbehind patterns. However, this quantifier in both the lookbehind and lookahead makes no difference as these are non-consuming patterns, and the regex engine still checks each position inside a string for a pattern match.

You can use

(?<=x)(?=o)|(?<=o)(?=x)

See a Kotlin demo:

val str = "xxoooooooxxoxoxooooo"
val reg = Regex("(?<=x)(?=o)|(?<=o)(?=x)")
var list = str.split(reg)
println(list)
// => [xx, ooooooo, xx, o, x, o, x, ooooo]


Related Topics



Leave a reply



Submit