How to prevent java.lang.String.split() from creating a leading empty string?
Your best bet is probably just to strip out any leading delimiter:
String input = "/Test/Stuff";
String[] test = input.replaceFirst("^/", "").split("/");
You can make it more generic by putting it in a method:
public String[] mySplit(final String input, final String delim)
{
return input.replaceFirst("^" + delim, "").split(delim);
}
String[] test = mySplit("/Test/Stuff", "/");
How to split String without leaving behind empty strings?
Add the "one or more times" greediness quantifier to your character class:
String[] inputTokens = input.split("[(),\\s]+");
This will result in one leading empty String, which is unavoidable when using the split()
method and splitting away the immediate start of the String and otherwise no empty Strings.
Prevent Empty values on string Split
I would replace the separating characters with spaces. Then you can use trim() to remove any extra characters at either end before you split. I'm assuming you don't want any empty strings in the results, so I've changed the expression slightly.
ab.replaceAll("\\W+", " ").trim().split(" ")
Empty string added to string array when splitting for numbers
Matcher is more applicable for this purpose then split:
int sum = 0;
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(str);
while(m.find()) {
sum+=Integer.parseInt(m.group());
}
return sum;
Java String split removed empty values
split(delimiter)
by default removes trailing empty strings from result array. To turn this mechanism off we need to use overloaded version of split(delimiter, limit)
with limit
set to negative value like
String[] split = data.split("\\|", -1);
Little more details:split(regex)
internally returns result of split(regex, 0)
and in documentation of this method you can find (emphasis mine)
The
limit
parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array.If the limit
n
is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.If
n
is non-positive then the pattern will be applied as many times as possible and the array can have any length.If
n
is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Exception:
It is worth mentioning that removing trailing empty string makes sense only if such empty strings were created by the split mechanism. So for "".split(anything)
since we can't split ""
farther we will get as result [""]
array.
It happens because split didn't happen here, so ""
despite being empty and trailing represents original string, not empty string which was created by splitting process.
String.split() - matching leading empty String prior to first delimiter?
If by "better" you mean higher performance then you might want to try creating a regular expression that matches what you want to match and using Matcher.find
in a loop and pulling out the matches as you find them. This saves modifying the string first. But measure it for yourself to see which is faster for your data.
If by "better" you mean simpler, then no I don't think there is a simpler way than the way you suggested: removing the leading separators before applying the split.
Java's split method has leading blank records that I can't suppress
One simple solution would be to remove the first +
from the string. This way, it won't split before the first keyword:
projStrTemp = inputLine.trim().substring(projStringSOF + 1).trim();
Edit: Personally, I'd go for a more robust solution using regular expressions. This finds all keywords preceded by +
. It also requires that +
is preceded by either a space or it's at the start of the line so that words like 3+4
aren't matched.
String inputLine = "+foo 3+4 foofoo foo foo @bar.com +foofoofoo +foo1 +foo2 +foo3";
Pattern re = Pattern.compile("(\\s|^)\\+(\\w+)");
Matcher m = re.matcher(inputLine);
while (m.find()) {
System.out.println(m.group(2));
}
Java split String that start with space
You can call string.trim()
and then string.split(" ")
. The trim()
method removes spaces before the first non-space-character and after the last non-space-character.
Why in Java 8 split sometimes removes empty strings at start of result array?
The behavior of String.split
(which calls Pattern.split
) changes between Java 7 and Java 8.
Documentation
Comparing between the documentation of Pattern.split
in Java 7 and Java 8, we observe the following clause being added:
When there is a positive-width match at the beginning of the input sequence then an empty leading substring is included at the beginning of the resulting array. A zero-width match at the beginning however never produces such empty leading substring.
The same clause is also added to String.split
in Java 8, compared to Java 7.
Reference implementation
Let us compare the code of Pattern.split
of the reference implemetation in Java 7 and Java 8. The code is retrieved from grepcode, for version 7u40-b43 and 8-b132.
Java 7
public String[] split(CharSequence input, int limit) {
int index = 0;
boolean matchLimited = limit > 0;
ArrayList<String> matchList = new ArrayList<>();
Matcher m = matcher(input);
// Add segments before each match found
while(m.find()) {
if (!matchLimited || matchList.size() < limit - 1) {
String match = input.subSequence(index, m.start()).toString();
matchList.add(match);
index = m.end();
} else if (matchList.size() == limit - 1) { // last one
String match = input.subSequence(index,
input.length()).toString();
matchList.add(match);
index = m.end();
}
}
// If no match was found, return this
if (index == 0)
return new String[] {input.toString()};
// Add remaining segment
if (!matchLimited || matchList.size() < limit)
matchList.add(input.subSequence(index, input.length()).toString());
// Construct result
int resultSize = matchList.size();
if (limit == 0)
while (resultSize > 0 && matchList.get(resultSize-1).equals(""))
resultSize--;
String[] result = new String[resultSize];
return matchList.subList(0, resultSize).toArray(result);
}
Java 8
public String[] split(CharSequence input, int limit) {
int index = 0;
boolean matchLimited = limit > 0;
ArrayList<String> matchList = new ArrayList<>();
Matcher m = matcher(input);
// Add segments before each match found
while(m.find()) {
if (!matchLimited || matchList.size() < limit - 1) {
if (index == 0 && index == m.start() && m.start() == m.end()) {
// no empty leading substring included for zero-width match
// at the beginning of the input char sequence.
continue;
}
String match = input.subSequence(index, m.start()).toString();
matchList.add(match);
index = m.end();
} else if (matchList.size() == limit - 1) { // last one
String match = input.subSequence(index,
input.length()).toString();
matchList.add(match);
index = m.end();
}
}
// If no match was found, return this
if (index == 0)
return new String[] {input.toString()};
// Add remaining segment
if (!matchLimited || matchList.size() < limit)
matchList.add(input.subSequence(index, input.length()).toString());
// Construct result
int resultSize = matchList.size();
if (limit == 0)
while (resultSize > 0 && matchList.get(resultSize-1).equals(""))
resultSize--;
String[] result = new String[resultSize];
return matchList.subList(0, resultSize).toArray(result);
}
The addition of the following code in Java 8 excludes the zero-length match at the beginning of the input string, which explains the behavior above.
if (index == 0 && index == m.start() && m.start() == m.end()) {
// no empty leading substring included for zero-width match
// at the beginning of the input char sequence.
continue;
}
Maintaining compatibility
Following behavior in Java 8 and above
To make split
behaves consistently across versions and compatible with the behavior in Java 8:
- If your regex can match zero-length string, just add
(?!\A)
at the end of the regex and wrap the original regex in non-capturing group(?:...)
(if necessary). - If your regex can't match zero-length string, you don't need to do anything.
- If you don't know whether the regex can match zero-length string or not, do both the actions in step 1.
(?!\A)
checks that the string does not end at the beginning of the string, which implies that the match is an empty match at the beginning of the string.
Following behavior in Java 7 and prior
There is no general solution to make split
backward-compatible with Java 7 and prior, short of replacing all instance of split
to point to your own custom implementation.
Related Topics
When Is It Ok to Catch Nullpointerexception
How to Match "Any Character" in Regular Expression
No Exception While Type Casting with a Null in Java
Differences Between Java 8 Date Time API (Java.Time) and Joda-Time
Does Java Have a Int.Tryparse That Doesn't Throw an Exception for Bad Data
How to Compile and Deploy a Java Class at Runtime
Sorting an Array of Int Using Bubblesort
Finding Number of Cores in Java
Implementation Difference Between Aggregation and Composition in Java
How I Save and Retrieve an Image on My Server in a Java Webapp
Correct Way of Throwing Exceptions with Reactor
How to Sort Two Arrays in Relation to Each Other
How and Where to Use Static Modifier in Java
How to Enable Commit on Focuslost for Tableview/Treetableview