Java Split Is Eating My Characters

Java split is eating my characters

Use zero-width matching assertions:

    String str = "la$le\\$li$lo";
System.out.println(java.util.Arrays.toString(
str.split("(?<!\\\\)\\$")
)); // prints "[la, le\$li, lo]"

The regex is essentially

(?<!\\)\$

It uses negative lookbehind to assert that there is not a preceding \.

See also

  • regular-expressions.info/Lookarounds

More examples of splitting on assertions

Simple sentence splitting, keeping punctuation marks:

    String str = "Really?Wow!This.Is.Awesome!";
System.out.println(java.util.Arrays.toString(
str.split("(?<=[.!?])")
)); // prints "[Really?, Wow!, This., Is., Awesome!]"

Splitting a long string into fixed-length parts, using \G

    String str = "012345678901234567890";
System.out.println(java.util.Arrays.toString(
str.split("(?<=\\G.{4})")
)); // prints "[0123, 4567, 8901, 2345, 6789, 0]"

Using a lookbehind/lookahead combo:

    String str = "HelloThereHowAreYou";
System.out.println(java.util.Arrays.toString(
str.split("(?<=[a-z])(?=[A-Z])")
)); // prints "[Hello, There, How, Are, You]"

Related questions

  • Can you use zero-width matching regex in String split?
  • Backreferences in lookbehind
  • How do I convert CamelCase into human-readable names in Java?

How to split a string, but also keep the delimiters?

You can use lookahead and lookbehind, which are features of regular expressions.

System.out.println(Arrays.toString("a;b;c;d".split("(?<=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("(?=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("((?<=;)|(?=;))")));

And you will get:

[a;, b;, c;, d]
[a, ;b, ;c, ;d]
[a, ;, b, ;, c, ;, d]

The last one is what you want.

((?<=;)|(?=;)) equals to select an empty character before ; or after ;.

EDIT: Fabian Steeg's comments on readability is valid. Readability is always a problem with regular expressions. One thing I do to make regular expressions more readable is to create a variable, the name of which represents what the regular expression does. You can even put placeholders (e.g. %1$s) and use Java's String.format to replace the placeholders with the actual string you need to use; for example:

static public final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";

public void someMethod() {
final String[] aEach = "a;b;c;d".split(String.format(WITH_DELIMITER, ";"));
...
}

Regex to split a string (in Java) so that spaces are preserved?

Thanks guys, that gave me the lead I needed ... I'm using (?<=[\\s]) and it works exactly the way I want!

problem with java split()

You don't use split!

Split is to get the things BETWEEN the separator.

For this you want to eliminate the unwanted chars; '-'

The solution is simple

out=in.replaceAll("-","");

Need to split a string into two parts in java

Use lookarounds: str.split("(?<=\\d)(?=\\D)")

String[] parts = "123XYZ".split("(?<=\\d)(?=\\D)");
System.out.println(parts[0] + "-" + parts[1]);
// prints "123-XYZ"

\d is the character class for digits; \D is its negation. So this zero-matching assertion matches the position where the preceding character is a digit (?<=\d), and the following character is a non-digit (?=\D).

References

  • regular-expressions.info/Lookarounds and Character Class

Related questions

  • Java split is eating my characters.
  • Is there a way to split strings with String.split() and include the delimiters?

Alternate solution using limited split

The following also works:

    String[] parts = "123XYZ".split("(?=\\D)", 2);
System.out.println(parts[0] + "-" + parts[1]);

This splits just before we see a non-digit. This is much closer to your original solution, except that since it doesn't actually match the non-digit character, it doesn't "eat it up". Also, it uses limit of 2, which is really what you want here.

API links

  • String.split(String regex, int limit)
    • If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.

How to split the string in java by \?

While you could escape the regular expression to String.split with the somewhat surprising

String str = "a\\b\\c";
str.split("\\\\");

it is also possible to compile a Pattern with Pattern.LITERAL and then use Pattern.split(CharSequence) like

String str = "a\\b\\c";
Pattern p = Pattern.compile("\\", Pattern.LITERAL);
String[] arr = p.split(str);
System.out.println(Arrays.toString(arr));

Which outputs

[a, b, c]

Java split String at |

Please, escape the character:

String[] parts = match.split("\\|");

Java Split String Consecutive Delimiters

String.split leaves an empty string ("") where it encounters consecutive delimiters, as long as you use the right regex. If you want to replace it with "empty", you'd have to do so yourself:

String[] split = barcodeFields.split("\\^");
for (int i = 0; i < split.length; ++i) {
if (split[i].length() == 0) {
split[i] = "empty";
}
}

Java String's split method ignores empty substrings

Use String.split(String regex, int limit) with negative limit (e.g. -1).

"aa,bb,cc,dd,,,,".split(",", -1)

When String.split(String regex) is called, it is called with limit = 0, which will remove all trailing empty strings in the array (in most cases, see below).

The actual behavior of String.split(String regex) is quite confusing:

  • Splitting an empty string will result in an array of length 1. Empty string split will always result in length 1 array containing the empty string.
  • Splitting ";" or ";;;" with regex being ";" will result in an empty array. Non-empty string split will result in all trailing empty strings in the array removed.

The behavior above can be observed from at least Java 5 to Java 8.

There was an attempt to change the behavior to return an empty array when splitting an empty string in JDK-6559590. However, it was soon reverted in JDK-8028321 when it causes regression in various places. The change never makes it into the initial Java 8 release.



Related Topics



Leave a reply



Submit