Collect Successive Pairs from a Stream

Collect successive pairs from a stream

My StreamEx library which extends standard streams provides a pairMap method for all stream types. For primitive streams it does not change the stream type, but can be used to make some calculations. Most common usage is to calculate differences:

int[] pairwiseDiffs = IntStreamEx.of(input).pairMap((a, b) -> (b-a)).toArray();

For object stream you can create any other object type. My library does not provide any new user-visible data structures like Pair (that's the part of library concept). However if you have your own Pair class and want to use it, you can do the following:

Stream<Pair> pairs = IntStreamEx.of(input).boxed().pairMap(Pair::new);

Or if you already have some Stream:

Stream<Pair> pairs = StreamEx.of(stream).pairMap(Pair::new);

This functionality is implemented using custom spliterator. It has quite low overhead and can parallelize nicely. Of course it works with any stream source, not just random access list/array like many other solutions. In many tests it performs really well. Here's a JMH benchmark where we find all input values preceding a larger value using different approaches (see this question).

Collect pairs from a stream

If you don't want to collect the elements

The title of the question says collect pairs from a stream, so I'd assume that you want to actually collect these, but you commented:

Your solution works, the problem is that it loads the data from file to PairList and then I may use stream from this collection to process pairs. I can't do it because the data might be too big to store in the memory.

so here's a way to do this without collecting the elements.

It's relatively straightforward to transform an Iterator<T> into an Iterator<List<T>>, and from that to transform a stream into a stream of pairs.

  /**
* Returns an iterator over pairs of elements returned by the iterator.
*
* @param iterator the base iterator
* @return the paired iterator
*/
public static <T> Iterator<List<T>> paired(Iterator<T> iterator) {
return new Iterator<List<T>>() {
@Override
public boolean hasNext() {
return iterator.hasNext();
}

@Override
public List<T> next() {
T first = iterator.next();
if (iterator.hasNext()) {
return Arrays.asList(first, iterator.next());
} else {
return Arrays.asList(first);
}
}
};
}

/**
* Returns an stream of pairs of elements from a stream.
*
* @param stream the base stream
* @return the pair stream
*/
public static <T> Stream<List<T>> paired(Stream<T> stream) {
return StreamSupport.stream(Spliterators.spliteratorUnknownSize(paired(stream.iterator()), Spliterator.ORDERED),
false);
}

@Test
public void iteratorAndStreamsExample() {
List<String> strings = Arrays.asList("a", "b", "c", "d", "e", "f");
Iterator<List<String>> pairs = paired(strings.iterator());
while (pairs.hasNext()) {
System.out.println(pairs.next());
// [a, b]
// [c, d]
// [e, f]
}

paired(Stream.of(1, 2, 3, 4, 5, 6, 7, 8)).forEach(System.out::println);
// [1, 2]
// [3, 4]
// [5, 6]
// [7, 8]
}

If you want to collect the elements...

I'd do this by collecting into a list, and using an AbstractList to provide a view of the elements as pairs.

First, the PairList. This is a simple AbstractList wrapper around any list that has an even number of elements. (This could easily be adapted to handle odd length lists, once the desired behavior is specified.)

  /**
* A view on a list of its elements as pairs.
*
* @param <T> the element type
*/
static class PairList<T> extends AbstractList<List<T>> {
private final List<T> elements;

/**
* Creates a new pair list.
*
* @param elements the elements
*
* @throws NullPointerException if elements is null
* @throws IllegalArgumentException if the length of elements is not even
*/
public PairList(List<T> elements) {
Objects.requireNonNull(elements, "elements must not be null");
this.elements = new ArrayList<>(elements);
if (this.elements.size() % 2 != 0) {
throw new IllegalArgumentException("number of elements must have even size");
}
}

@Override
public List<T> get(int index) {
return Arrays.asList(elements.get(index), elements.get(index + 1));
}

@Override
public int size() {
return elements.size() / 2;
}
}

Then we can define the collector that we need. This is essentially shorthand for collectingAndThen(toList(), PairList::new):

  /**
* Returns a collector that collects to a pair list.
*
* @return the collector
*/
public static <E> Collector<E, ?, PairList<E>> toPairList() {
return Collectors.collectingAndThen(Collectors.toList(), PairList::new);
}

Note that it could be worthwhile defining a PairList constructor that doesn't defensively copy the list, for the use case that we know the backing list is freshly generated (as in this case). That's not really essential right now, though. But once we did that, this method would be collectingAndThen(toCollection(ArrayList::new), PairList::newNonDefensivelyCopiedPairList).

And now we can use it:

  /**
* Creates a pair list with collectingAndThen, toList(), and PairList::new
*/
@Test
public void example() {
List<List<Integer>> intPairs = Stream.of(1, 2, 3, 4, 5, 6)
.collect(toPairList());
System.out.println(intPairs); // [[1, 2], [2, 3], [3, 4]]

List<List<String>> stringPairs = Stream.of("a", "b", "c", "d")
.collect(toPairList());
System.out.println(stringPairs); // [[a, b], [b, c]]
}

Here's a complete source file with a runnable example (as a JUnit test):

package ex;

import java.util.AbstractList;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Objects;
import java.util.stream.Collector;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import org.junit.Test;

public class PairCollectors {

/**
* A view on a list of its elements as pairs.
*
* @param <T> the element type
*/
static class PairList<T> extends AbstractList<List<T>> {
private final List<T> elements;

/**
* Creates a new pair list.
*
* @param elements the elements
*
* @throws NullPointerException if elements is null
* @throws IllegalArgumentException if the length of elements is not even
*/
public PairList(List<T> elements) {
Objects.requireNonNull(elements, "elements must not be null");
this.elements = new ArrayList<>(elements);
if (this.elements.size() % 2 != 0) {
throw new IllegalArgumentException("number of elements must have even size");
}
}

@Override
public List<T> get(int index) {
return Arrays.asList(elements.get(index), elements.get(index + 1));
}

@Override
public int size() {
return elements.size() / 2;
}
}

/**
* Returns a collector that collects to a pair list.
*
* @return the collector
*/
public static <E> Collector<E, ?, PairList<E>> toPairList() {
return Collectors.collectingAndThen(Collectors.toList(), PairList::new);
}

/**
* Creates a pair list with collectingAndThen, toList(), and PairList::new
*/
@Test
public void example() {
List<List<Integer>> intPairs = Stream.of(1, 2, 3, 4, 5, 6)
.collect(toPairList());
System.out.println(intPairs); // [[1, 2], [2, 3], [3, 4]]

List<List<String>> stringPairs = Stream.of("a", "b", "c", "d")
.collect(toPairList());
System.out.println(stringPairs); // [[a, b], [b, c]]
}
}

Collect Pairs from Streams into another Object

with stream you can do it as follows;

@Test
void testStream() {

class DoubleDateObject {
LocalDate d1;
LocalDate d2;

public DoubleDateObject(LocalDate d1, LocalDate d2) {
this.d1 = d1;
this.d2 = d2;
}

@Override
public String toString() {
return "DoubleDateObject{" +
"d1=" + d1 +
", d2=" + d2 +
'}';
}
}

LocalDate[] dates = {LocalDate.now(), LocalDate.now(), LocalDate.now(), LocalDate.now()};

List<DoubleDateObject> ret = IntStream.range(0, dates.length)
.filter(i -> i % 2 == 1)
.mapToObj(i -> new DoubleDateObject(dates[i - 1], dates[i]))
.collect(Collectors.toList());

System.out.println(ret);

}

The output will be;

[DoubleDateObject{d1=2021-09-03, d2=2021-09-03}, DoubleDateObject{d1=2021-09-03, d2=2021-09-03}]

And with flux you can use the following, it will produce the same result;

List<DoubleDateObject> ret = Flux.range(0, dates.length)
.filter(i -> i % 2 == 1)
.map(i -> new DoubleDateObject(dates[i - 1], dates[i]))
.collectList().block();

And if you do not have the array at the first place, then you can do the following which again produces the same output; (replace Arrays.stream(dates) with your stream)

final LocalDate[] tmp = {null};
List<DoubleDateObject> ret = Arrays.stream(dates)
.peek(date -> {
if (tmp[0] == null) tmp[0] = date;
else tmp[0] = null;
})
.filter(date -> tmp[0] == null)
.map(date -> new DoubleDateObject(tmp[0], date))
.collect(Collectors.toList());

Two primitive type streams to one stream of pairs

Your solution relies on the same size of both collections, so we should focus on this. Let's put the Java Stream API aside for a while since there is no other way traversing two collections simultaneously than using IntStream range with indices. Simply put, Java Stream API is not suitable for this use-case.

You need to ensure that no IndexOutOfBoundsException is thrown upon calling List::get. I prefer two ways:

  1. Two iterators with conjugated conditions:

    List<Foo> fooList = new ArrayList<>();
    while (namesIterator.hasNext() && countsIterator.hasNext()) {
    Foo foo = new Foo(namesIterator.next(), countsIterator.next());
    fooList.add(foo);
    }
  2. Using for-each iteration with indices up to the lower bound of both list sizes:

    int bound = Math.min(names.size(), counts.size());
    List<Foo> fooList = new ArrayList<>();
    for (int i=0; i<bound; i++) {
    Foo foo = new Foo(names.get(i), counts.get(i);
    fooList.add(foo);
    }

    .. which is similar to the Java Stream API way:

    List<Foo> fooList = IntStream.rangeClosed(0, Math.min(names.size(), counts.size()))
    .mapToObj(i -> new Foo(names.get(i), counts.get(i)))
    .collect(Collectors.toList());

There are also external libraries with dedicated methods to zipping such as Streams::zip from Guava or Seq::zip from jOOλ. The zipping mechanism is pretty much the same across libraries.

The current design of Java Stream API is not suitable for it.

Collect a stream after an objects attribute exceeds a certain threshold

If I understand your question correctly then you can use Stream#dropWhile(Predicate):

Returns, if this stream is ordered, a stream consisting of the remaining elements of this stream after dropping the longest prefix of elements that match the given predicate. Otherwise returns, if this stream is unordered, a stream consisting of the remaining elements of this stream after dropping a subset of elements that match the given predicate.

Example:

List<MaDate> originalList = ...;
List<MaDate> newList = originalList.stream()
.dropWhile(m -> m.getTemp() < 10)
.collect(Collectors.toList());

Note that dropWhile was added in Java 9. This Q&A shows a workaround if you're using Java 8: Limit a stream by a predicate.

How to collect data from a stream in different lists based on a condition?

I'd rather change the order and also collect the data into a Map<String, List<String>> where the key would be the entity name.

Assuming splittedLine is the array of lines, I'd probably do something like this:

Set<String> L100_ENTITY_NAMES = Set.of("L100", ...);
String delimiter = String.valueOf(DELIMITER);

Map<String, List<String>> result =
Arrays.stream(splittedLine)
.map(line -> {
String[] values = line.split(delimiter );
if( values.length < 3) {
return null;
}

return new AbstractMap.SimpleEntry<>(values[2], line);
})
.filter(Objects::nonNull)
.filter(tempLine -> L100_ENTITY_NAMES.contains(tempLine.getEntityName()))
.collect(Collectors.groupingBy(Map.Entry::getKey,
Collectors.mapping(Map.Entry::getValue, Collectors.toList());

Note that this isn't necessarily shorter but has a couple of other advantages:

  • It's not O(n*m) but rather O(n * log(m)), so it should be faster for non-trivial stream sizes
  • You get an entity name for each list rather than having to rely on the indices in both lists
  • It's easier to understand because you use distinct steps:
    • split and map the line
    • filter null values, i.e. lines that aren't valid in the first place
    • filter lines that don't have any of the L100 entity names
    • collect the filtered lines by entity name so you can easily access the sub lists

Java Streams - Map two string lines each to one object

You can make your own Collector which temporarily stores the previous element/string. When the current element starts with a $, the name of the product is stored in prev. Now you can convert the price to a double and create the object.

private class ProductCollector {

private final List<Product> list = new ArrayList<>();

private String prev;

public void accept(String str) {
if (prev != null && str.startsWith("$")) {
double price = Double.parseDouble(str.substring(1));
list.add(new Product(prev, price));
}
prev = str;
}

public List<Product> finish() {
return list;
}

public static Collector<String, ?, List<Product>> collector() {
return Collector.of(ProductCollector::new, ProductCollector::accept, (a, b) -> a, ProductCollector::finish);
}
}

Since you need to rely on the sequence (line with price follows line with name), the stream cannot be processed in parallel. Here is how you can use your custom collector:

String[] lines = new String[]{
"Ice Cream", "$3.99",
"Chocolate", "$5.00",
"Nice Shoes", "$84.95"
};

List<Product> products = Stream.of(lines)
.sequential()
.collect(ProductCollector.collector());

Note that your prices are not integers which is why I used a double to represent them properly.

Java - Concatenate Consecutive Elements in a Stream

You were almost there. You already had the pairs, all you have to do now is smash them with a "-" in the middle into a String.

Give this a try:

Arrays.stream(exampleArray)
.filter(s -> s.length() > 0) // gets rid of blanks
.filter(s -> !s.contains("junk"))
.collect(Collectors.groupingBy(it -> counter.getAndIncrement() / 2))
.values()
.stream() //stream the pairs
.map(l -> String.join("-", l)) //and put a "-" between them & into a string
.collect(Collectors.toList()) //collect all your joined String


Related Topics



Leave a reply



Submit