How to Prevent Java.Lang.String.Split() from Creating a Leading Empty String

How to prevent java.lang.String.split() from creating a leading empty string?

Your best bet is probably just to strip out any leading delimiter:

String input = "/Test/Stuff";
String[] test = input.replaceFirst("^/", "").split("/");

You can make it more generic by putting it in a method:

public String[] mySplit(final String input, final String delim)
{
return input.replaceFirst("^" + delim, "").split(delim);
}

String[] test = mySplit("/Test/Stuff", "/");

How to split String without leaving behind empty strings?

Add the "one or more times" greediness quantifier to your character class:

String[] inputTokens = input.split("[(),\\s]+");

This will result in one leading empty String, which is unavoidable when using the split() method and splitting away the immediate start of the String and otherwise no empty Strings.

Prevent Empty values on string Split

I would replace the separating characters with spaces. Then you can use trim() to remove any extra characters at either end before you split. I'm assuming you don't want any empty strings in the results, so I've changed the expression slightly.

ab.replaceAll("\\W+", " ").trim().split(" ")

Empty string added to string array when splitting for numbers

Matcher is more applicable for this purpose then split:

int sum = 0;
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(str);
while(m.find()) {
sum+=Integer.parseInt(m.group());
}
return sum;

Java String split removed empty values

split(delimiter) by default removes trailing empty strings from result array. To turn this mechanism off we need to use overloaded version of split(delimiter, limit) with limit set to negative value like

String[] split = data.split("\\|", -1);

Little more details:

split(regex) internally returns result of split(regex, 0) and in documentation of this method you can find (emphasis mine)

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array.

If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.

If n is non-positive then the pattern will be applied as many times as possible and the array can have any length.

If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

Exception:

It is worth mentioning that removing trailing empty string makes sense only if such empty strings were created by the split mechanism. So for "".split(anything) since we can't split "" farther we will get as result [""] array.

It happens because split didn't happen here, so "" despite being empty and trailing represents original string, not empty string which was created by splitting process.

String.split() - matching leading empty String prior to first delimiter?

If by "better" you mean higher performance then you might want to try creating a regular expression that matches what you want to match and using Matcher.find in a loop and pulling out the matches as you find them. This saves modifying the string first. But measure it for yourself to see which is faster for your data.

If by "better" you mean simpler, then no I don't think there is a simpler way than the way you suggested: removing the leading separators before applying the split.

Java's split method has leading blank records that I can't suppress


One simple solution would be to remove the first + from the string. This way, it won't split before the first keyword:

projStrTemp = inputLine.trim().substring(projStringSOF + 1).trim();

Edit: Personally, I'd go for a more robust solution using regular expressions. This finds all keywords preceded by +. It also requires that + is preceded by either a space or it's at the start of the line so that words like 3+4 aren't matched.

String inputLine = "+foo 3+4 foofoo foo foo @bar.com +foofoofoo +foo1 +foo2 +foo3";
Pattern re = Pattern.compile("(\\s|^)\\+(\\w+)");
Matcher m = re.matcher(inputLine);
while (m.find()) {
System.out.println(m.group(2));
}

Java split String that start with space

You can call string.trim()and then string.split(" "). The trim() method removes spaces before the first non-space-character and after the last non-space-character.

Why in Java 8 split sometimes removes empty strings at start of result array?

The behavior of String.split (which calls Pattern.split) changes between Java 7 and Java 8.

Documentation

Comparing between the documentation of Pattern.split in Java 7 and Java 8, we observe the following clause being added:

When there is a positive-width match at the beginning of the input sequence then an empty leading substring is included at the beginning of the resulting array. A zero-width match at the beginning however never produces such empty leading substring.

The same clause is also added to String.split in Java 8, compared to Java 7.

Reference implementation

Let us compare the code of Pattern.split of the reference implemetation in Java 7 and Java 8. The code is retrieved from grepcode, for version 7u40-b43 and 8-b132.

Java 7

public String[] split(CharSequence input, int limit) {
int index = 0;
boolean matchLimited = limit > 0;
ArrayList<String> matchList = new ArrayList<>();
Matcher m = matcher(input);

// Add segments before each match found
while(m.find()) {
if (!matchLimited || matchList.size() < limit - 1) {
String match = input.subSequence(index, m.start()).toString();
matchList.add(match);
index = m.end();
} else if (matchList.size() == limit - 1) { // last one
String match = input.subSequence(index,
input.length()).toString();
matchList.add(match);
index = m.end();
}
}

// If no match was found, return this
if (index == 0)
return new String[] {input.toString()};

// Add remaining segment
if (!matchLimited || matchList.size() < limit)
matchList.add(input.subSequence(index, input.length()).toString());

// Construct result
int resultSize = matchList.size();
if (limit == 0)
while (resultSize > 0 && matchList.get(resultSize-1).equals(""))
resultSize--;
String[] result = new String[resultSize];
return matchList.subList(0, resultSize).toArray(result);
}

Java 8

public String[] split(CharSequence input, int limit) {
int index = 0;
boolean matchLimited = limit > 0;
ArrayList<String> matchList = new ArrayList<>();
Matcher m = matcher(input);

// Add segments before each match found
while(m.find()) {
if (!matchLimited || matchList.size() < limit - 1) {
if (index == 0 && index == m.start() && m.start() == m.end()) {
// no empty leading substring included for zero-width match
// at the beginning of the input char sequence.
continue;
}
String match = input.subSequence(index, m.start()).toString();
matchList.add(match);
index = m.end();
} else if (matchList.size() == limit - 1) { // last one
String match = input.subSequence(index,
input.length()).toString();
matchList.add(match);
index = m.end();
}
}

// If no match was found, return this
if (index == 0)
return new String[] {input.toString()};

// Add remaining segment
if (!matchLimited || matchList.size() < limit)
matchList.add(input.subSequence(index, input.length()).toString());

// Construct result
int resultSize = matchList.size();
if (limit == 0)
while (resultSize > 0 && matchList.get(resultSize-1).equals(""))
resultSize--;
String[] result = new String[resultSize];
return matchList.subList(0, resultSize).toArray(result);
}

The addition of the following code in Java 8 excludes the zero-length match at the beginning of the input string, which explains the behavior above.

            if (index == 0 && index == m.start() && m.start() == m.end()) {
// no empty leading substring included for zero-width match
// at the beginning of the input char sequence.
continue;
}

Maintaining compatibility

Following behavior in Java 8 and above

To make split behaves consistently across versions and compatible with the behavior in Java 8:

  1. If your regex can match zero-length string, just add (?!\A) at the end of the regex and wrap the original regex in non-capturing group (?:...) (if necessary).
  2. If your regex can't match zero-length string, you don't need to do anything.
  3. If you don't know whether the regex can match zero-length string or not, do both the actions in step 1.

(?!\A) checks that the string does not end at the beginning of the string, which implies that the match is an empty match at the beginning of the string.

Following behavior in Java 7 and prior

There is no general solution to make split backward-compatible with Java 7 and prior, short of replacing all instance of split to point to your own custom implementation.



Related Topics



Leave a reply



Submit