Java String Split Removed Empty Values

Java String split removed empty values

split(delimiter) by default removes trailing empty strings from result array. To turn this mechanism off we need to use overloaded version of split(delimiter, limit) with limit set to negative value like

String[] split = data.split("\\|", -1);

Little more details:

split(regex) internally returns result of split(regex, 0) and in documentation of this method you can find (emphasis mine)

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array.

If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.

If n is non-positive then the pattern will be applied as many times as possible and the array can have any length.

If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

Exception:

It is worth mentioning that removing trailing empty string makes sense only if such empty strings were created by the split mechanism. So for "".split(anything) since we can't split "" farther we will get as result [""] array.

It happens because split didn't happen here, so "" despite being empty and trailing represents original string, not empty string which was created by splitting process.

How to remove empty results after splitting with regex in Java?

Don't use split. Use find method which will return all matching substrings. You can do it like

Pattern reg = Pattern.compile("\\d+");
Matcher m = reg.matcher("asd0085 sa223 9349x");
while (m.find())
System.out.println(m.group());

which will print

0085
223
9349

Based on your regex it seems that your goal is also to remove leading zeroes like in case of 0085. If that is true, you can use regex like 0*(\\d+) and take part matched by group 1 (the one in parenthesis) and let leading zeroes be matched outside of that group.

Pattern reg = Pattern.compile("0*(\\d+)");
Matcher m = reg.matcher("asd0085 sa223 9349x");
while (m.find())
System.out.println(m.group(1));

Output:

85
223
9349

But if you really want to use split then change "\\D0*" to \\D+0* so you could split on one-or-more non-digits \\D+, not just one non-digit \\D, but with this solution you may need to ignore first empty element in result array (depending if string will start with element which should be split on, or not).

Remove empty Strings after splitting a StringBuilder into Array Java

split takes a regex. So:

String[] ar = sb.toString().split("\\s+");

The string \\s is regexp-ese for 'any whitespace', and the + is: 1 or more of it. If you want to split on spaces only (and not on newlines, tabs, etc), try: String[] ar = sb.toString().split(" +"); which is literally: "split on one or more spaces".

This trick works for just about any separator. For example, split on commas? Try: .split("\\s*,\\s*"), which is: 0 or more whitespace, a comma, followed by 0 or more whitespace (and regexes take as much as they can).

Note that this trick does NOT get rid of leading and trailing whitespace. But to do that, use trim. Putting it all together:

String[] ar = sb.toString().trim().split("\\s+");

and for commas:

String[] ar = sb.toString().trim().split("\\s*,\\s*");

String split() dropping trailing empty entries

See javadoc:

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

So it is behaving as defined. If you're not happy with that, you can do what the manual suggests and use a negative parameter for limit.

String[] parts = "a.b.c...d...".split("\\.", -1);
for ( int i = 0; i < parts.length; i++ )
System.out.println("" + i + ": '" + parts[i] + "'" );

0: 'a'
1: 'b'
2: 'c'
3: ''
4: ''
5: 'd'
6: ''
7: ''
8: ''

Java String's split method ignores empty substrings

Use String.split(String regex, int limit) with negative limit (e.g. -1).

"aa,bb,cc,dd,,,,".split(",", -1)

When String.split(String regex) is called, it is called with limit = 0, which will remove all trailing empty strings in the array (in most cases, see below).

The actual behavior of String.split(String regex) is quite confusing:

  • Splitting an empty string will result in an array of length 1. Empty string split will always result in length 1 array containing the empty string.
  • Splitting ";" or ";;;" with regex being ";" will result in an empty array. Non-empty string split will result in all trailing empty strings in the array removed.

The behavior above can be observed from at least Java 5 to Java 8.

There was an attempt to change the behavior to return an empty array when splitting an empty string in JDK-6559590. However, it was soon reverted in JDK-8028321 when it causes regression in various places. The change never makes it into the initial Java 8 release.

Prevent Empty values on string Split

I would replace the separating characters with spaces. Then you can use trim() to remove any extra characters at either end before you split. I'm assuming you don't want any empty strings in the results, so I've changed the expression slightly.

ab.replaceAll("\\W+", " ").trim().split(" ")

Java: String split(): I want it to include the empty strings at the end

use str.split("\n", -1) (with a negative limit argument). When split is given zero or no limit argument it discards trailing empty fields, and when it's given a positive limit argument it limits the number of fields to that number, but a negative limit means to allow any number of fields and not discard trailing empty fields. This is documented here and the behavior is taken from Perl.

String.split() ignoring empty values inbetween delimiters if on the final part of a string

By default split removes trailing empty strings from result array. To turn off this mechanism use split(regex, limit) with negative limit like

split("\\|", -1)

Little more details:

split(regex) internally returns result of split(regex, 0) and in documentation of this method you can find (emphasis mine)

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array.

If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.

If n is non-positive then the pattern will be applied as many times as possible and the array can have any length.

If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

Why does split on an empty string return a non-empty array?

For the same reason that

",test" split ','

and

",test," split ','

will return an array of size 2. Everything before the first match is returned as the first element.



Related Topics



Leave a reply



Submit