How to split a list into sublists based on a separator, similar to str.split()?
A simple generator will work for all of the cases in your question:
def split(sequence, sep):
chunk = []
for val in sequence:
if val == sep:
yield chunk
chunk = []
else:
chunk.append(val)
yield chunk
Splitting List into sublists along elements
The only solution I come up with for the moment is by implementing your own custom collector.
Before reading the solution, I want to add a few notes about this. I took this question more as a programming exercise, I'm not sure if it can be done with a parallel stream.
So you have to be aware that it'll silently break if the pipeline is run in parallel.
This is not a desirable behavior and should be avoided. This is why I throw an exception in the combiner part (instead of (l1, l2) -> {l1.addAll(l2); return l1;}
), as it's used in parallel when combining the two lists, so that you have an exception instead of a wrong result.
Also this is not very efficient due to list copying (although it uses a native method to copy the underlying array).
So here's the collector implementation:
private static Collector<String, List<List<String>>, List<List<String>>> splitBySeparator(Predicate<String> sep) {
final List<String> current = new ArrayList<>();
return Collector.of(() -> new ArrayList<List<String>>(),
(l, elem) -> {
if (sep.test(elem)) {
l.add(new ArrayList<>(current));
current.clear();
}
else {
current.add(elem);
}
},
(l1, l2) -> {
throw new RuntimeException("Should not run this in parallel");
},
l -> {
if (current.size() != 0) {
l.add(current);
return l;
}
);
}
and how to use it:
List<List<String>> ll = list.stream().collect(splitBySeparator(Objects::isNull));
Output:
[[a, b], [c], [d, e]]
As the answer of Joop Eggen is out, it appears that it can be done in parallel (give him credit for that!). With that it reduces the custom collector implementation to:
private static Collector<String, List<List<String>>, List<List<String>>> splitBySeparator(Predicate<String> sep) {
return Collector.of(() -> new ArrayList<List<String>>(Arrays.asList(new ArrayList<>())),
(l, elem) -> {if(sep.test(elem)){l.add(new ArrayList<>());} else l.get(l.size()-1).add(elem);},
(l1, l2) -> {l1.get(l1.size() - 1).addAll(l2.remove(0)); l1.addAll(l2); return l1;});
}
which let the paragraph about parallelism a bit obsolete, however I let it as it can be a good reminder.
Note that the Stream API is not always a substitute. There are tasks that are easier and more suitable using the streams and there are tasks that are not. In your case, you could also create a utility method for that:
private static <T> List<List<T>> splitBySeparator(List<T> list, Predicate<? super T> predicate) {
final List<List<T>> finalList = new ArrayList<>();
int fromIndex = 0;
int toIndex = 0;
for(T elem : list) {
if(predicate.test(elem)) {
finalList.add(list.subList(fromIndex, toIndex));
fromIndex = toIndex + 1;
}
toIndex++;
}
if(fromIndex != toIndex) {
finalList.add(list.subList(fromIndex, toIndex));
}
return finalList;
}
and call it like List<List<String>> list = splitBySeparator(originalList, Objects::isNull);
.
It can be improved for checking edge-cases.
Related Topics
How to Get Max() to Return Variable Names Instead of Values in Python
Sqlalchemy: How to Filter Date Field
How to Increment a Variable on a for Loop in Jinja Template
Permissionerror: [Errno 13] Permission Denied Flask.Run()
Pyqt: Getting Widgets to Resize Automatically in a Qdialog
Best Practices for Adding .Gitignore File for Python Projects
Python: Assign Labels to Values in an Array
How to Easily Print Ascii-Art Text
Write a Dictionary With Multiple Values to Store Data in Columns in the CSV File
Split List into Two Parts Based on Some Delimiter in Each List Element in Python
Render_Template in Python-Flask Is Not Working
Format/Suppress Scientific Notation from Pandas Aggregation Results
How to Change Default Python Version
How to Use Ffmpeg in a Python Function
How to Extract Data from Dictionary in the List
Stuck With Loops in Python - Only Returning First Value
How to Skip Empty Dates (Weekends) in a Financial Matplotlib Python Graph
Conda: Remove All Installed Packages from Base/Root Environment