Java 8: Performance of Streams VS Collections

Do Java 8 streams produce slower code than plain imperative loops?

QUESTION: Should I use old loop imperative approach if I'm looking for the best code performance?

Right now, probably yes. Various benchmarks seem to suggest that streams are slower than loops for most tests. Though not catastrophically slower.

Counter examples:

In some cases, parallel streams can give a useful speed up.
Lazy streams can provide performance benefits for some problems; see http://java.amitph.com/2014/01/java-8-streams-api-laziness.html

It is possible to do equivalent things with loops, you can't do it with just loops.

But the bottom line is that performance is complicated and streams are not (yet) a magic bullet for speeding up your code.

Is FP paradigm is just to make code "more concise and readable" but not about performance?

Not exactly. It is certainly true that the FP paradigm is more concise and (to someone who is familiar with it) more readable.

However, by expressing the using the FP paradigm, you are also expressing it in a way that potentially could be optimized in ways that are much harder to achieve with code expressed using loops and assignment. FP code is also more amenable to formal methods; i.e. formal proof of correctness.

(In the context of this discussion of streams, "could be optimized" means in some future Java release.)

In Java, what are the advantages of streams over loops?

Interesting that the interview question asks about the advantages, without asking about disadvantages, for there are are both.

Streams are a more declarative style. Or a more expressive style. It may be considered better to declare your intent in code, than to describe how it's done:

 return people
     .filter( p -> p.age() < 19)
     .collect(toList());

... says quite clearly that you're filtering matching elements from a list, whereas:

 List<Person> filtered = new ArrayList<>();
 for(Person p : people) {
     if(p.age() < 19) {
         filtered.add(p);
     }
 }
 return filtered;

Says "I'm doing a loop". The purpose of the loop is buried deeper in the logic.

Streams are often terser. The same example shows this. Terser isn't always better, but if you can be terse and expressive at the same time, so much the better.

Streams have a strong affinity with functions. Java 8 introduces lambdas and functional interfaces, which opens a whole toybox of powerful techniques. Streams provide the most convenient and natural way to apply functions to sequences of objects.

Streams encourage less mutability. This is sort of related to the functional programming aspect -- the kind of programs you write using streams tend to be the kind of programs where you don't modify objects.

Streams encourage looser coupling. Your stream-handling code doesn't need to know the source of the stream, or its eventual terminating method.

Streams can succinctly express quite sophisticated behaviour. For example:

 stream.filter(myfilter).findFirst();

Might look at first glance as if it filters the whole stream, then returns the first element. But in fact findFirst() drives the whole operation, so it efficiently stops after finding one item.

Streams provide scope for future efficiency gains. Some people have benchmarked and found that single-threaded streams from in-memory Lists or arrays can be slower than the equivalent loop. This is plausible because there are more objects and overheads in play.

But streams scale. As well as Java's built-in support for parallel stream operations, there are a few libraries for distributed map-reduce using Streams as the API, because the model fits.

Disadvantages?

Performance: A for loop through an array is extremely lightweight both in terms of heap and CPU usage. If raw speed and memory thriftiness is a priority, using a stream is worse.

Familiarity.The world is full of experienced procedural programmers, from many language backgrounds, for whom loops are familiar and streams are novel. In some environments, you want to write code that's familiar to that kind of person.

Cognitive overhead. Because of its declarative nature, and increased abstraction from what's happening underneath, you may need to build a new mental model of how code relates to execution. Actually you only need to do this when things go wrong, or if you need to deeply analyse performance or subtle bugs. When it "just works", it just works.

Debuggers are improving, but even now, when you're stepping through stream code in a debugger, it can be harder work than the equivalent loop, because a simple loop is very close to the variables and code locations that a traditional debugger works with.

Are there any direct or indirect performance benefits of java 8 sequential streams?

First of all, letting special cases, like omitting a redundant sorted operation or returning the known size on count(), aside, the time complexity of an operation usually doesn’t change, so all differences in execution timing are usually about a constant offset or a (rather small) factor, not fundamental changes.

You can always write a manual loop doing basically the same as the Stream implementation does internally. So, internal optimizations, as mentioned by this answer could always get dismissed with “but I could do the same in my loop”.

But… when we compare “the Stream” with “a loop”, is it really reasonable to assume that all manual loops are written in the most efficient manner for the particular use case? A particular Stream implementation will apply its optimizations to all use cases where applicable, regardless of the experience level of the calling code’s author. I’ve already seen loops missing the opportunity to short-circuit or performing redundant operations not needed for a particular use case.

Another aspect is the information needed to perform certain optimizations. The Stream API is built around the Spliterator interface which can provide characteristics of the source data, e.g. it allows to find out whether the data has a meaningful order needed to be retained for certain operations or whether it is already pre-sorted, to the natural order or with a particular comparator. It may also provide the expected number of elements, as an estimate or exact, when predictable.

A method receiving an arbitrary Collection, to implement an algorithm with an ordinary loop, would have a hard time to find out, whether there are such characteristics. A List implies a meaningful order, whereas a Set usually does not, unless it’s a SortedSet or a LinkedHashSet, whereas the latter is a particular implementation class, rather than an interface. So testing against all known constellations may still miss 3rd party implementations with special contracts not expressible by a predefined interface.

Of course, since Java 8, you could acquire a Spliterator yourself, to examine these characteristics, but that would change your loop solution to a non-trivial thing and also imply repeating the work already done with the Stream API.

There’s also another interesting difference between Spliterator based Stream solutions and conventional loops, using an Iterator when iterating over something other than an array. The pattern is to invoke hasNext on the iterator, followed by next, unless hasNext returned false. But the contract of Iterator does not mandate this pattern. A caller may invoke next without hasNext, even multiple times, when it is known to succeed (e.g. you do already know the collection’s size). Also, a caller may invoke hasNext multiple times without next in case the caller did not remember the result of the previous call.

As a consequence, Iterator implementations have to perform redundant operations, e.g. the loop condition is effectively checked twice, once in hasNext, to return a boolean, and once in next, to throw a NoSuchElementException when not fulfilled. Often, the hasNext has to perform the actual traversal operation and store the result into the Iterator instance, to ensure that the result stays valid until the subsequent next call. The next operation in turn, has to check whether such a traversal did already happen or whether it has to perform the operation itself. In practice, the hot spot optimizer may or may not eliminate the overhead imposed by the Iterator design.

In contrast, the Spliterator has a single traversal method, boolean tryAdvance(Consumer<? super T> action), which performs the actual operation and returns whether there was an element. This simplifies the loop logic significantly. There’s even the void forEachRemaining(Consumer<? super T> action) for non-short-circuiting operations, which allows the actual implementation to provide the entire looping logic. E.g., in case of ArrayList the operation will end up at a simple counting loop over the indices, performing a plain array access.

You may compare such design with, e.g. readLine() of BufferedReader, which performs the operation and returns null after the last element, or find() of a regex Matcher, which performs the search, updates the matcher’s state and returns the success state.

But the impact of such design differences is hard to predict in an environment with an optimizer designed specifically to identify and eliminate redundant operations. The takeaway is that there is some potential for Stream based solutions to turn out to be even faster, while it depends on a lot of factors whether it will ever materialize in a particular scenario. As said at the beginning, it’s usually not changing the overall time complexity, which would be more important to worry about.

Java 8 Stream vs Collection Storage

The statement about streams and storage means that a stream doesn't have any storage of its own. If the stream's source is a collection, then obviously that collection has storage to hold the elements.

Let's take one of examples from that article:

int sum = shapes.stream()
                .filter(s -> s.getColor() == BLUE)
                .mapToInt(s -> s.getWeight())
                .sum();

Assume that shapes is a Collection that has millions of elements. One might imagine that the filter operation would iterate over the elements from the source and create a temporary collection of results, which might also have millions of elements. The mapToInt operation might then iterate over that temporary collection and generate its results to be summed.

That's not how it works. There is no temporary, intermediate collection. The stream operations are pipelined, so elements emerging from filter are passed through mapToInt and thence to sum without being stored into and read from a collection.

If the stream source weren't a collection -- say, elements were being read from a network collection -- there needn't be any storage at all. A pipeline like the following:

int sum = streamShapesFromNetwork()
                .filter(s -> s.getColor() == BLUE)
                .mapToInt(s -> s.getWeight())
                .sum();

might process millions of elements, but it wouldn't need to store millions of elements anywhere.

What is more efficient: sorted stream or sorting a list?

It is safe to say that two forms of sort will have the same complexity ... even without looking at the code. (If they didn't then one form would be severely broken!)

Looking at Java 8 source code for streams (specifically the internal class java.util.stream.SortedOps), the sorted() method adds a component to a stream pipeline that captures all of the stream elements into either an array or an ArrayList.

An array is used if and only if the pipeline assembly code can deduce the number of elements in the stream ahead of time.
Otherwise, an ArrayList is used to gather the elements to be sorted.

If an ArrayList is used, you incur the extra overhead of building / growing the list.

Then we return to two versions of the code:

List<Item> sortedItems = new ArrayList<>(items);
Collections.sort(sortedItems, itemComparator);

In this version, the ArrayList constructor copies the elements items to an appropriately sized array, and then Collections.sort does an in-place sort of that array. (This happens under the covers).

List<Item> sortedItems = items
    .stream()
    .sorted(itemComparator)
    .collect(Collectors.toList());

In this version, as we have seen above, the code associated with sorted() either builds and sorts an array (equivalent to what happens above) or it builds the ArrayList the slow way. But on top of that, there are the overheads of stream the data from items and to the collector.

Overall (with the Java 8 implementation at least) code examination tells me that first version of the code cannot be slower than the second version, and in most (if not all) cases it will be faster. But as the list gets larger, the O(NlogN) sorting will tend to dominate the O(N) overheads of copying. That will mean that the relative difference between the two versions will get smaller.

If you really care, you should write a benchmark to test the actual difference with a specific implementation of Java, and a specific input dataset. (Or adapt @Eugene's benchmark!)