Is There a Common Java Utility to Break a List into Batches

Is there a common Java utility to break a list into batches?

Check out Lists.partition(java.util.List, int) from Google Guava:

Returns consecutive sublists of a list, each of the same size (the final list may be smaller). For example, partitioning a list containing [a, b, c, d, e] with a partition size of 3 yields [[a, b, c], [d, e]] -- an outer list containing two inner lists of three and two elements, all in the original order.

Java: how can I split an ArrayList in multiple small ArrayLists?

You can use subList(int fromIndex, int toIndex) to get a view of a portion of the original list.

From the API:

Returns a view of the portion of this list between the specified fromIndex, inclusive, and toIndex, exclusive. (If fromIndex and toIndex are equal, the returned list is empty.) The returned list is backed by this list, so non-structural changes in the returned list are reflected in this list, and vice-versa. The returned list supports all of the optional list operations supported by this list.

Example:

List<Integer> numbers = new ArrayList<Integer>(
Arrays.asList(5,3,1,2,9,5,0,7)
);

List<Integer> head = numbers.subList(0, 4);
List<Integer> tail = numbers.subList(4, 8);
System.out.println(head); // prints "[5, 3, 1, 2]"
System.out.println(tail); // prints "[9, 5, 0, 7]"

Collections.sort(head);
System.out.println(numbers); // prints "[1, 2, 3, 5, 9, 5, 0, 7]"

tail.add(-1);
System.out.println(numbers); // prints "[1, 2, 3, 5, 9, 5, 0, 7, -1]"

If you need these chopped lists to be NOT a view, then simply create a new List from the subList. Here's an example of putting a few of these things together:

// chops a list into non-view sublists of length L
static <T> List<List<T>> chopped(List<T> list, final int L) {
List<List<T>> parts = new ArrayList<List<T>>();
final int N = list.size();
for (int i = 0; i < N; i += L) {
parts.add(new ArrayList<T>(
list.subList(i, Math.min(N, i + L)))
);
}
return parts;
}

List<Integer> numbers = Collections.unmodifiableList(
Arrays.asList(5,3,1,2,9,5,0,7)
);
List<List<Integer>> parts = chopped(numbers, 3);
System.out.println(parts); // prints "[[5, 3, 1], [2, 9, 5], [0, 7]]"
parts.get(0).add(-1);
System.out.println(parts); // prints "[[5, 3, 1, -1], [2, 9, 5], [0, 7]]"
System.out.println(numbers); // prints "[5, 3, 1, 2, 9, 5, 0, 7]" (unmodified!)

Split a list into sublists in java using if possible flatMap

Given that the "sublists" are all of equal size and that you can divide the list into exact sublists of the same size, you could calculate the desired size and then map an IntStream to the starting indexes of each sublist and use that to extract them:

List<Integer> mylist = Arrays.asList(1,2,3,4,5,6,7,8,9,10,11,12);
int size = mylist.size();
int parts = 6;
int partSize = size / parts;
List<List<Integer>> result =
IntStream.range(0, parts)
.mapToObj(i -> mylist.subList(i * partSize, (i + 1) * partSize)))
.collect(Collectors.toList());

EDIT:

IdeOne demo graciously provided by @Turing85

how to split a list into a given number of sub-lists?

Guava has a function Lists.partition which will do this for you

Usage:

Lists.partition(mylist, mylist.size()/3);

Java 8 Stream with batch processing

Note! This solution reads the whole file before running the forEach.

You could do it with jOOλ, a library that extends Java 8 streams for single-threaded, sequential stream use-cases:

Seq.seq(lazyFileStream)              // Seq<String>
.zipWithIndex() // Seq<Tuple2<String, Long>>
.groupBy(tuple -> tuple.v2 / 500) // Map<Long, List<String>>
.forEach((index, batch) -> {
process(batch);
});

Behind the scenes, zipWithIndex() is just:

static <T> Seq<Tuple2<T, Long>> zipWithIndex(Stream<T> stream) {
final Iterator<T> it = stream.iterator();

class ZipWithIndex implements Iterator<Tuple2<T, Long>> {
long index;

@Override
public boolean hasNext() {
return it.hasNext();
}

@Override
public Tuple2<T, Long> next() {
return tuple(it.next(), index++);
}
}

return seq(new ZipWithIndex());
}

... whereas groupBy() is API convenience for:

default <K> Map<K, List<T>> groupBy(Function<? super T, ? extends K> classifier) {
return collect(Collectors.groupingBy(classifier));
}

(Disclaimer: I work for the company behind jOOλ)

Efficient way to divide a list into lists of n size

You'll want to do something that makes use of List.subList(int, int) views rather than copying each sublist. To do this really easily, use Guava's Lists.partition(List, int) method:

List<Foo> foos = ...
for (List<Foo> partition : Lists.partition(foos, n)) {
// do something with partition
}

Note that this, like many things, isn't very efficient with a List that isn't RandomAccess (such as a LinkedList).

Loop arraylist in batches

You do it like remainder from batch size and list size to find count.

int batchSize = 10;
int start = 0;
int end = batchSize;

int count = list.size() / batchSize;
int remainder = list.size() % batchSize;
int counter = 0;
for(int i = 0 ; i < count ; i ++)
{
System.out.println("counter " + counter);
for(int counter = start ; counter < end ; counter ++)
{
//access array as a[counter]
}
start = start + batchSize;
end = end + batchSize;
}

if(remainder != 0)
{
end = end - batchSize + remainder;
for(int counter = start ; counter < end ; counter ++)
{
//access array as a[counter]
}
}

Splitting List into sublists along elements

The only solution I come up with for the moment is by implementing your own custom collector.

Before reading the solution, I want to add a few notes about this. I took this question more as a programming exercise, I'm not sure if it can be done with a parallel stream.

So you have to be aware that it'll silently break if the pipeline is run in parallel.

This is not a desirable behavior and should be avoided. This is why I throw an exception in the combiner part (instead of (l1, l2) -> {l1.addAll(l2); return l1;}), as it's used in parallel when combining the two lists, so that you have an exception instead of a wrong result.

Also this is not very efficient due to list copying (although it uses a native method to copy the underlying array).

So here's the collector implementation:

private static Collector<String, List<List<String>>, List<List<String>>> splitBySeparator(Predicate<String> sep) {
final List<String> current = new ArrayList<>();
return Collector.of(() -> new ArrayList<List<String>>(),
(l, elem) -> {
if (sep.test(elem)) {
l.add(new ArrayList<>(current));
current.clear();
}
else {
current.add(elem);
}
},
(l1, l2) -> {
throw new RuntimeException("Should not run this in parallel");
},
l -> {
if (current.size() != 0) {
l.add(current);
return l;
}
);
}

and how to use it:

List<List<String>> ll = list.stream().collect(splitBySeparator(Objects::isNull));

Output:

[[a, b], [c], [d, e]]



As the answer of Joop Eggen is out, it appears that it can be done in parallel (give him credit for that!). With that it reduces the custom collector implementation to:

private static Collector<String, List<List<String>>, List<List<String>>> splitBySeparator(Predicate<String> sep) {
return Collector.of(() -> new ArrayList<List<String>>(Arrays.asList(new ArrayList<>())),
(l, elem) -> {if(sep.test(elem)){l.add(new ArrayList<>());} else l.get(l.size()-1).add(elem);},
(l1, l2) -> {l1.get(l1.size() - 1).addAll(l2.remove(0)); l1.addAll(l2); return l1;});
}

which let the paragraph about parallelism a bit obsolete, however I let it as it can be a good reminder.


Note that the Stream API is not always a substitute. There are tasks that are easier and more suitable using the streams and there are tasks that are not. In your case, you could also create a utility method for that:

private static <T> List<List<T>> splitBySeparator(List<T> list, Predicate<? super T> predicate) {
final List<List<T>> finalList = new ArrayList<>();
int fromIndex = 0;
int toIndex = 0;
for(T elem : list) {
if(predicate.test(elem)) {
finalList.add(list.subList(fromIndex, toIndex));
fromIndex = toIndex + 1;
}
toIndex++;
}
if(fromIndex != toIndex) {
finalList.add(list.subList(fromIndex, toIndex));
}
return finalList;
}

and call it like List<List<String>> list = splitBySeparator(originalList, Objects::isNull);.

It can be improved for checking edge-cases.



Related Topics



Leave a reply



Submit