Why Doesn't Java.Util.Collection Implement the New Stream Interface

Why doesn't CollectionT Implement StreamT?

From Maurice Naftalin's Lambda FAQ:

Why are Stream operations not defined directly on Collection?


Early drafts of the API exposed methods like filter, map, and reduce on Collection or Iterable. However, user experience with this design led to a more formal separation of the “stream” methods into their own abstraction. Reasons included:

  • Methods on Collection such as removeAll make in-place modifications, in contrast to the new methods which are more functional in nature. Mixing two different kinds of methods on the same abstraction forces the user to keep track of which are which. For example, given the declaration

    Collection strings;

    the two very similar-looking method calls

    strings.removeAll(s -> s.length() == 0);
    strings.filter(s -> s.length() == 0); // not supported in the current API

    would have surprisingly different results; the first would remove all empty String objects from the collection, whereas the second would return a stream containing all the non-empty Strings, while having no effect on the collection.

    Instead, the current design ensures that only an explicitly-obtained stream can be filtered:

    strings.stream().filter(s.length() == 0)...;

    where the ellipsis represents further stream operations, ending with a terminating operation. This gives the reader a much clearer intuition about the action of filter;

  • With lazy methods added to Collection, users were confused by a perceived—but erroneous—need to reason about whether the collection was in “lazy mode” or “eager mode”. Rather than burdening Collection with new and different functionality, it is cleaner to provide a Stream view with the new functionality;

  • The more methods added to Collection, the greater the chance of name collisions with existing third-party implementations. By only adding a few methods (stream, parallel) the chance for conflict is greatly reduced;

  • A view transformation is still needed to access a parallel view; the asymmetry between the sequential and the parallel stream views was unnatural. Compare, for example

    coll.filter(...).map(...).reduce(...);

    with

    coll.parallel().filter(...).map(...).reduce(...);

    This asymmetry would be particularly obvious in the API documentation, where Collection would have many new methods to produce sequential streams, but only one to produce parallel streams, which would then have all the same methods as Collection. Factoring these into a separate interface, StreamOps say, would not help; that would still, counterintuitively, need to be implemented by both Stream and Collection;

  • A uniform treatment of views also leaves room for other additional views in the future.

Why isn't there an Interface for something that provides a StreamE?

This should probably augment any future answers.

I don't know why you think that returning an Iterable<MyCoolObject> rather than a Collection<MyCoolObject> is better. It might hide details indeed, and will create more problems as-well.

A Collection has a known size that plays a big role while splitting for parallel processing. This is reported as Spliterator.SIZED | Spliterator.SUBSIZED. So a Collection.stream will handle parallel streams much better then a Iterable, that will use:

public static <T> Spliterator<T> spliteratorUnknownSize 

that is documented as:

... and implements trySplit to permit limited parallelism.

Which is obvious, since you don't know the size at all. Under the current implementation the batch size is 1024. So, for example, for anything under 1024 elements you would not get any parallelisation at all.

Now as far as your question goes, there used to be such a thing in the early builds of jdk-8. It was called java.util.stream.Streamable. From what I know it was removed because there are methods that return a Stream, but not via the stream() method.

String::codePoints()
File::lines
Pattern::splitAsStream
... many others

So the only place where this would be implemented would be the Collections. And that as far as I can tell this would be a really isolated place.

Aha moment

Here is the explanation from the people in charge of this.

As suggested here are the reasons for the removal:

I am considering dropping the Streamable interface. Currently the only
implementor is Collection, and all of the other stream-bearing methods
are serving up specialized streams (chars(), codePoints(), lines(), etc)
with a method name that is more suitable than "stream". So I think we
should drop Streamable and leave the stream() / parallel() methods on
Collection (or possibly move them up Iterable).

Out of the java.util.stream.Stream interfaces's two collect methods, is one of them poorly constructed?

In Java 9, the documentation of the Stream.collect(Supplier, BiConsumer, BiConsumer) method has been updated and now it explicitly mentions that you should fold elements from the second result container into the first one:

combiner - an associative, non-interfering, stateless function that accepts two partial result containers and merges them, which must be compatible with the accumulator function. The combiner function must fold the elements from the second result container into the first result container.

(Emphasis mine).

Why does IterableT not provide stream() and parallelStream() methods?

This was not an omission; there was detailed discussion on the EG list in June of 2013.

The definitive discussion of the Expert Group is rooted at this thread.

While it seemed "obvious" (even to the Expert Group, initially) that stream() seemed to make sense on Iterable, the fact that Iterable was so general became a problem, because the obvious signature:

Stream<T> stream()

was not always what you were going to want. Some things that were Iterable<Integer> would rather have their stream method return an IntStream, for example. But putting the stream() method this high up in the hierarchy would make that impossible. So instead, we made it really easy to make a Stream from an Iterable, by providing a spliterator() method. The implementation of stream() in Collection is just:

default Stream<E> stream() {
return StreamSupport.stream(spliterator(), false);
}

Any client can get the stream they want from an Iterable with:

Stream s = StreamSupport.stream(iter.spliterator(), false);

In the end we concluded that adding stream() to Iterable would be a mistake.

Why can't I use Stream#toList to collect a list of a class' interface in Java 16?

.collect(Collectors.toList()) works because the signature of collect is:

<R, A> R collect(Collector<? super T, A, R> collector);

the important part being ? super T

which means the toList() collector can be interpreted as Collector<Dodo,?,List<Dodo> (when you assign the result of .collect() to a List<Dodo>) even though the type of your stream is Stream<FancyDodo>.

On the other hand, the signature of Stream's toList() is:

List<T> toList()

so if you execute it for a Stream<FancyDodo>, you'll get a List<FancyDodo>, which can't be assigned to a List<Dodo> variable.

I suggest you simply use stream.collect(Collectors.toList()) instead of stream.toList() in this case.

Why does StreamT not implement IterableT?

People have already asked the same on the mailing list ☺. The main reason is Iterable also has a re-iterable semantic, while Stream is not.

I think the main reason is that Iterable implies reusability, whereas Stream is something that can only be used once — more like an Iterator.

If Stream extended Iterable then existing code might be surprised when it receives an Iterable that throws an Exception the
second time they do for (element : iterable).

Avoiding .stream() and .collect() when using lambdas with the old collection classes in Java 8

forEach is defined directly on List. But for most operations, the expected use is something like:

convertedList = myList.stream().filter(...).map(...).collect(Collectors.toList());

so the conversion to and from a stream is pretty fluid.



Related Topics



Leave a reply



Submit