Collection to Stream to a New Collection

Collection to stream to a new collection

There’s a reason why most examples avoid storing the result into a Collection. It’s not the recommended way of programming. You already have a Collection, the one providing the source data and collections are of no use on its own. You want to perform certain operations on it so the ideal case is to perform the operation using the stream and skip storing the data in an intermediate Collection. This is what most examples try to suggest.

Of course, there are a lot of existing APIs working with Collections and there always will be. So the Stream API offers different ways to handle the demand for a Collection.

  • Get an unmodifiable List implementation containing all elements (JDK 16):

    List<T> results = l.stream().filter(…).toList();
  • Get an arbitrary List implementation holding the result:

    List<T> results = l.stream().filter(…).collect(Collectors.toList());
  • Get an unmodifiable List forbidding null like List.of(…) (JDK 10):

    List<T> results = l.stream().filter(…).collect(Collectors.toUnmodifiableList());
  • Get an arbitrary Set implementation holding the result:

    Set<T> results = l.stream().filter(…).collect(Collectors.toSet());
  • Get a specific Collection:

    ArrayList<T> results =
    l.stream().filter(…).collect(Collectors.toCollection(ArrayList::new));
  • Add to an existing Collection:

    l.stream().filter(…).forEach(existing::add);
  • Create an array:

    String[] array=l.stream().filter(…).toArray(String[]::new);
  • Use the array to create a list with a specific specific behavior (mutable, fixed size):

    List<String> al=Arrays.asList(l.stream().filter(…).toArray(String[]::new));
  • Allow a parallel capable stream to add to temporary local lists and join them afterward:

    List<T> results
    = l.stream().filter(…).collect(ArrayList::new, List::add, List::addAll);

    (Note: this is closely related to how Collectors.toList() is currently implemented, but that’s an implementation detail, i.e. there is no guarantee that future implementations of the toList() collectors will still return an ArrayList)

Combine stream of Collections into one Collection - Java 8

This functionality can be achieved with a call to the flatMap method on the stream, which takes a Function that maps the Stream item to another Stream on which you can collect.

Here, the flatMap method converts the Stream<Collection<Long>> to a Stream<Long>, and collect collects them into a Collection<Long>.

Collection<Long> longs = streamOfCollections
.flatMap( coll -> coll.stream())
.collect(Collectors.toList());

java 8 stream creating collection of type B from collection of type A

Mapping an Employee to a Person you can use Collectors.mapping/Stream.map that others has already provided so I will skipped it.

Notes that mapping way is faster than map then collect way, because collect(mapping(...)) is O(N) but map(...).collect(...) is O(2N), but map(...).collect(...) more readable than collect(mapping(...)), and mapping refer to a public transform(Employee) method reference instead of Function<Employee,Person> which will be reused as a method to transform an Employee to a Person. and then two transform method have the same semantics, they both are adapter methods.

public List<Person> transform(List<Employee> employees) throws Throwable {
return employees.stream()
.filter(Objects::nonNull)
.collect(Collectors.mapping(this::transform, Collectors.toList()));
}

public Person transform(Employee it) {
return new Person(it.firstName, it.lastName, it.email);
}

java streams - How to filter a collection to two new collections

Well, you can make your existing code compile with a trivial modification:

 public static void main(String[] args) {         
List<Integer> numbers = new ArrayList<>(Arrays.asList(1, 2, 3, 5, 8, 13, 21));
List<Integer> odds = new ArrayList<>();
List<Integer> evens = new ArrayList<>();
numbers.stream().forEach(x -> (x % 2 == 0 ? evens : odds).add(x));
}

The conditional ?: operator is an expression, which isn't a valid statement on its own. My modified code changes the use of the conditional operator to just select which list to add to, and then calls add on it - and that method invocation expression is a valid statement.

An alternative would be to collect using Collectors.partitioningBy - although in this particular case that would probably be more confusing code than what you've got.

Java streams: Collect a nested collection

The problem is that List#add doesn't return a List. Instead, you need to return the list after the mapping:

List<ArrayList<Integer>> result = nested.stream()
.map(list -> {
list.add(100);
return list;
})
.collect(Collectors.toList());

Or you may skip using map and do it using forEach, since ArrayList is mutable:

nested.forEach(list -> list.add(100));

Collecting stream back into the same collection type

It is not possible without violating the principle on which the Java streams framework has been built on. It would completely violate the idea of abstracting the stream from its physical representation.

The sequence of bulk data operations goes in a pipeline, see the following picture:
Pipeline: A Sequence of Bulk Data Operations

The stream is somehow similar to the Schrödinger's cat - it is not materialized until you call the terminal operation. The stream handling is completely abstract and detached from the original stream source.

Pipeline as a Black Box

If you want to work so low-level with your original data storage, don't feel ashamed simply avoiding the streams. They are just a tool, not anything sacred. By introducing streams, the Good Old Collections are still as good as they were, with added value of the internal iteration - the new Iterable.forEach() method.


Added to satisfy your curiosity :)

A possible solution follows. I don't like it myself, and I have not been able to solve all the generics issues there, but it works with limitations.

The idea is creating a collector returning the same type as the input collection. However, not all the collections provide a nullary constructor (with no parameters), and without it the Class.newInstance() method does not work. There is also the problem of the awkwardness of checked exceptions within lambda expression. (It is mentioned in this nice answer here: https://stackoverflow.com/a/22919112/2886891)

public Collection<Integer> getBiggerThan(Collection<Integer> col, int value) {
// Collection below is an example of one of the rare appropriate
// uses of raw types. getClass returns the runtime type of col, and
// at runtime all type parameters have been erased.
@SuppressWarnings("rawtypes")
final Class<? extends Collection> clazz = col.getClass();
System.out.println("Input collection type: " + clazz);
final Supplier<Collection<Integer>> supplier = () -> {
try {
return clazz.newInstance();
}
catch (InstantiationException | IllegalAccessException e) {
throw new RuntimeException(
"A checked exception caught inside lambda", e);
}
};
// After all the ugly preparatory code, enjoy the clean pipeline:
return col.stream()
.filter(v -> v > value)
.collect(supplier, Collection::add, Collection::addAll);
}

As you can see, it works in general, supposed your original collection provides a nullary constructor.

public void test() {
final Collection<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

final Collection<Integer> arrayList = new ArrayList<>(numbers);
final Collection<Integer> arrayList2 = getBiggerThan(arrayList, 6);
System.out.println(arrayList2);
System.out.println(arrayList2.getClass());
System.out.println();

final Collection<Integer> set = new HashSet<>(arrayList);
final Collection<Integer> set2 = getBiggerThan(set, 6);
System.out.println(set2);
System.out.println(set2.getClass());
System.out.println();

// This does not work as Arrays.asList() is of a type
// java.util.Arrays$ArrayList which does not provide a nullary constructor
final Collection<Integer> numbers2 = getBiggerThan(numbers, 6);
}

Stream vs Collection as return type

In this context, the notion of "strong consistency requirement" is relative to the system or application within which the code resides. There's no specific notion of "strong consistency" that's independent of the system or application. Here's an example of "consistency" that is determined by what assertions you can make about a result. It should be clear that the semantics of these assertions are entirely application-specific.

Suppose you have some code that implements a room where people can enter and leave. You might want the relevant methods to be synchronized so that all enter and leave actions occur in some order. For example: (using Java 16)

record Person(String name) { }

public class Room {
final Set<Person> occupants = Collections.newSetFromMap(new ConcurrentHashMap<>());

public synchronized void enter(Person p) { occupants.add(p); }
public synchronized void leave(Person p) { occupants.remove(p); }
public Stream<Person> occupants() { return occupants.stream(); }
}

(Note, I'm using ConcurrentHashMap here because it doesn't throw ConcurrentModificationException if it's modified during iteration.)

Next, consider some threads to execute these methods in this order:

room.enter(new Person("Brett"));
room.enter(new Person("Chris"));
room.enter(new Person("Dana"));
room.leave(new Person("Dana"));
room.enter(new Person("Ashley"));

Now, at around the same time, suppose a caller gets a list of persons in the room by doing this:

List<Person> occupants1 = room.occupants().toList();

The result might be:

[Dana, Brett, Chris, Ashley]

How is this possible? The stream is lazily evaluated, and the elements are being pulled into a List at the same time other threads are modifying the source of the stream. In particular, it's possible for the stream to have "seen" Dana, then Dana is removed and Ashley added, and then the stream advances and encounters Ashley.

What does the stream represent, then? To find out, we have to dig into what ConcurrentHashMap says about its streams in the presence of concurrent modification. The set is built from CHM's keySet view, which says "The view's iterators and spliterators are weakly consistent." The definition of weakly consistent is in turn:

Most concurrent Collection implementations (including most Queues) also differ from the usual java.util conventions in that their Iterators and Spliterators provide weakly consistent rather than fast-fail traversal:

  • they may proceed concurrently with other operations
  • they will never throw ConcurrentModificationException
  • they are guaranteed to traverse elements as they existed upon construction exactly once, and may (but are not guaranteed to) reflect any modifications subsequent to construction.

What does this mean for our Room application? I'd say it means that if a person appears in the stream of occupants, that person was in the room at some point. That's a pretty weak statement. Note in particular that it does not allow you say that Dana and Ashley were in the room at the same time. It might seem that way from the contents of the List, but that would be incorrect, as a simple inspection reveals.

Now suppose we were to change the Room class to return a List instead of a Stream, and the caller were to use that instead:

// in class Room
public synchronized List<Person> occupants() { return List.copyOf(occupants); }

// in the caller
List<Person> occupants2 = room.occupants();

The result might be:

[Dana, Brett, Chris]

You can make much stronger statements about this List than about the previous one. You can say that Chris and Dana were in the room at the same time, and that at this particular point in time, that Ashley was not in the room.

The List version of occupants() gives you a snapshot of the occupants of the room at a particular time. This allows you much stronger statements than the stream version, which only tells you that certain persons were in the room at some point.

Why would you ever want an API with weaker semantics? Again, it depends on the application. If you want to send a survey to people who used room, all you care about is whether they were ever in the room. You don't care about other things, like who else was in the room at the same time.

The API with stronger semantics is potentially more expensive. It needs to make a copy of the collection, which means allocating space and spending time copying. It needs to hold a lock while it does this, to prevent concurrent modification, and this temporarily blocks other updates from proceeding.

To summarize, the notion of "strong" or "weak" consistency is highly dependent on the context. In this case I made up an example with some associated semantics, such as "in the room at the same time" or "was in the room at some point in time." The semantics required by the application determine the strength or weakness of the consistency of the results. This in turn drives what Java mechanisms should be used, such as streams vs. collections and when to apply locks.

Is it possible to collect a stream to two different collections using one line?

You can collect the elements after filtering and then use this list to map the elements to search results:

public void search(Predicate<String> predicate, Elements elements) {
List<Element> filteredElements =
elements.stream()
.filter(element -> predicate.test(element.ownText()))
.collect(Collectors.toList());

List<SearchResult> searchResults =
filteredElements.stream()
.map(element -> new SearchResult(element.ownText(),element.baseUri(),element.tagName()))
.collect(Collectors.toList());
}

This won't take more time than the solutions of the other answers but doesn't have side effects, is thread safe as well as easy to read and understand.

Java 8 : Order a Collection by stream representation date in ascending way using collections streams

Collections.max()

You don’t need to sort all of your images to find the newest. Collections.max() can do that. Simplified code to illustrate my point:

    List<Image> unsortedImages = // …;
if (! unsortedImages.isEmpty()) {
Id latestId = Collections.max(unsortedImages,
Comparator.comparingLong(img -> Long.parseLong(img.created())))
.id();
// Proceed with deleting as in your code
}

I understand that Image.created() returns a String holding a Unix timestamp (number of seconds since the Unix epoch in 1970).



Related Topics



Leave a reply



Submit