How to Avoid Trailing Empty Items Being Removed When Splitting Strings

Java String split removed empty values

split(delimiter) by default removes trailing empty strings from result array. To turn this mechanism off we need to use overloaded version of split(delimiter, limit) with limit set to negative value like

String[] split = data.split("\\|", -1);

Little more details:

split(regex) internally returns result of split(regex, 0) and in documentation of this method you can find (emphasis mine)

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array.

If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.

If n is non-positive then the pattern will be applied as many times as possible and the array can have any length.

If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

Exception:

It is worth mentioning that removing trailing empty string makes sense only if such empty strings were created by the split mechanism. So for "".split(anything) since we can't split "" farther we will get as result [""] array.

It happens because split didn't happen here, so "" despite being empty and trailing represents original string, not empty string which was created by splitting process.

how to prevent split from removing empty elements

Split with a negative limit will preserve trailing empty fields.

@fields = split(/,/, "a,,", -1);

How to split string with trailing empty strings in result?

As Peter mentioned in his answer, "string".split(), in both Java and Scala, does not return trailing empty strings by default.

You can, however, specify for it to return trailing empty strings by passing in a second parameter, like this:

String s = "elem1,elem2,,";
String[] tokens = s.split(",", -1);

And that will get you the expected result.

You can find the related Java doc here.

How to remove falsy values when splitting a string with a non-whitespace separator

If you want to be obtuse, you could use filter(None, x) to remove falsey items:

>>> list(filter(None, '1,2,,3,'.split(',')))
['1', '2', '3']

Probably less Pythonic. It might be clearer to iterate over the items specifically:

for w in '1,2,,3,'.split(','):
if w:

This makes it clear that you're skipping the empty items and not relying on the fact that str.split sometimes skips empty items.

I'd just as soon use a regex, either to skip consecutive runs of the separator (but watch out for the end):

>>> re.split(r',+', '1,2,,3,')
['1', '2', '3', '']

or to find everything that's not a separator:

>>> re.findall(r'[^,]+', '1,2,,3,')
['1', '2', '3']

If you want to go way back in Python's history, there were two separate functions, split and splitfields. I think the name explains the purpose. The first splits on any whitespace, useful for arbitrary text input, and the second behaves predictably on some delimited input. They were implemented in pure Python before v1.6.

Split vs Strip in Python to remove redundant white space

According to the documentation:

If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.

Which means, that the logic of strip() is already included into split(), so I think, your teacher is wrong. (Notice, that this will change in case if you're using a non-default separator.)

How to remove empty trailing values and Carriage Return in JS Array?

I don't think there's any magic bullet here, just a loop checking for the values you want to remove, directly or with a regular expression. For instance, to remove blank strings and "\r":

while (array.length) {                      // Loop while there are still entries
const last = array[array.length - 1]; // Get the last entry without removing it
if (last !== "" && last !== "\r") { // Is this one to remove?
break; // No, stop
}
--array.length; // Yes, remove and keep looping
}

Live Example:

const array = ['', 'Apple', '', 'Banana', '', 'Guava', '', '', '', '\r'];
while (array.length) { // Loop while there are still entries
const last = array[array.length - 1]; // Get the last entry without removing it
if (last !== "" && last !== "\r") { // Is this one to remove?
break; // No, stop
}
--array.length; // Yes, remove and keep looping
}

console.log(array);

Empty strings at the beginning and end of split

After reading AWK's specification following mu is too short, I came to feel that the original intention for split in AWK was to extract substrings that correspond to fields, each of which is terminated by a punctuation mark like ,, ., and the separator was considered something like an "end of field character". The intention was not splitting a string symmetrically into the left and the right side of each separator position, but was terminating a substring on the left side of a separator position. Under this conception, it makes sense to always have some string (even if it is empty) on the left of the separator, but not necessarily on the right side of the separator. This may have been inherited to Ruby via Perl.

Why are empty strings returned in split() results?

str.split complements str.join, so

"/".join(['', 'segment', 'segment', ''])

gets you back the original string.

If the empty strings were not there, the first and last '/' would be missing after the join().



Related Topics



Leave a reply



Submit