In Java, What Are the Advantages of Streams Over Loops

In Java, what are the advantages of streams over loops?

Interesting that the interview question asks about the advantages, without asking about disadvantages, for there are are both.

Streams are a more declarative style. Or a more expressive style. It may be considered better to declare your intent in code, than to describe how it's done:

 return people
     .filter( p -> p.age() < 19)
     .collect(toList());

... says quite clearly that you're filtering matching elements from a list, whereas:

 List<Person> filtered = new ArrayList<>();
 for(Person p : people) {
     if(p.age() < 19) {
         filtered.add(p);
     }
 }
 return filtered;

Says "I'm doing a loop". The purpose of the loop is buried deeper in the logic.

Streams are often terser. The same example shows this. Terser isn't always better, but if you can be terse and expressive at the same time, so much the better.

Streams have a strong affinity with functions. Java 8 introduces lambdas and functional interfaces, which opens a whole toybox of powerful techniques. Streams provide the most convenient and natural way to apply functions to sequences of objects.

Streams encourage less mutability. This is sort of related to the functional programming aspect -- the kind of programs you write using streams tend to be the kind of programs where you don't modify objects.

Streams encourage looser coupling. Your stream-handling code doesn't need to know the source of the stream, or its eventual terminating method.

Streams can succinctly express quite sophisticated behaviour. For example:

 stream.filter(myfilter).findFirst();

Might look at first glance as if it filters the whole stream, then returns the first element. But in fact findFirst() drives the whole operation, so it efficiently stops after finding one item.

Streams provide scope for future efficiency gains. Some people have benchmarked and found that single-threaded streams from in-memory Lists or arrays can be slower than the equivalent loop. This is plausible because there are more objects and overheads in play.

But streams scale. As well as Java's built-in support for parallel stream operations, there are a few libraries for distributed map-reduce using Streams as the API, because the model fits.

Disadvantages?

Performance: A for loop through an array is extremely lightweight both in terms of heap and CPU usage. If raw speed and memory thriftiness is a priority, using a stream is worse.

Familiarity.The world is full of experienced procedural programmers, from many language backgrounds, for whom loops are familiar and streams are novel. In some environments, you want to write code that's familiar to that kind of person.

Cognitive overhead. Because of its declarative nature, and increased abstraction from what's happening underneath, you may need to build a new mental model of how code relates to execution. Actually you only need to do this when things go wrong, or if you need to deeply analyse performance or subtle bugs. When it "just works", it just works.

Debuggers are improving, but even now, when you're stepping through stream code in a debugger, it can be harder work than the equivalent loop, because a simple loop is very close to the variables and code locations that a traditional debugger works with.

When should streams be preferred over traditional loops for best performance? Do streams take advantage of branch-prediction?

I agree to the point that programming with streams is nice and easier for some scenarios but when we're losing out on performance, why do we need to use them?

Performance is rarely an issue. It would be usual for 10% of your streams would need to be rewritten as loops to get the performance you need.

Is there something I'm missing out on?

Using parallelStream() is much easier using streams and possibly more efficient as it's hard to write efficient concurrent code.

Which is the scenario in which streams perform equal to loops? Is it only in the case where your function defined takes a lot of time, resulting in a negligible loop performance?

Your benchmark is flawed in the sense that the code hasn't been compiled when it starts. I would do the whole test in a loop as JMH does, or I would use JMH.

In none of the scenario's I could see streams taking advantage of branch-prediction

Branch prediction is a CPU feature not a JVM or streams feature.

Are there any direct or indirect performance benefits of java 8 sequential streams?

First of all, letting special cases, like omitting a redundant sorted operation or returning the known size on count(), aside, the time complexity of an operation usually doesn’t change, so all differences in execution timing are usually about a constant offset or a (rather small) factor, not fundamental changes.

You can always write a manual loop doing basically the same as the Stream implementation does internally. So, internal optimizations, as mentioned by this answer could always get dismissed with “but I could do the same in my loop”.

But… when we compare “the Stream” with “a loop”, is it really reasonable to assume that all manual loops are written in the most efficient manner for the particular use case? A particular Stream implementation will apply its optimizations to all use cases where applicable, regardless of the experience level of the calling code’s author. I’ve already seen loops missing the opportunity to short-circuit or performing redundant operations not needed for a particular use case.

Another aspect is the information needed to perform certain optimizations. The Stream API is built around the Spliterator interface which can provide characteristics of the source data, e.g. it allows to find out whether the data has a meaningful order needed to be retained for certain operations or whether it is already pre-sorted, to the natural order or with a particular comparator. It may also provide the expected number of elements, as an estimate or exact, when predictable.

A method receiving an arbitrary Collection, to implement an algorithm with an ordinary loop, would have a hard time to find out, whether there are such characteristics. A List implies a meaningful order, whereas a Set usually does not, unless it’s a SortedSet or a LinkedHashSet, whereas the latter is a particular implementation class, rather than an interface. So testing against all known constellations may still miss 3rd party implementations with special contracts not expressible by a predefined interface.

Of course, since Java 8, you could acquire a Spliterator yourself, to examine these characteristics, but that would change your loop solution to a non-trivial thing and also imply repeating the work already done with the Stream API.

There’s also another interesting difference between Spliterator based Stream solutions and conventional loops, using an Iterator when iterating over something other than an array. The pattern is to invoke hasNext on the iterator, followed by next, unless hasNext returned false. But the contract of Iterator does not mandate this pattern. A caller may invoke next without hasNext, even multiple times, when it is known to succeed (e.g. you do already know the collection’s size). Also, a caller may invoke hasNext multiple times without next in case the caller did not remember the result of the previous call.

As a consequence, Iterator implementations have to perform redundant operations, e.g. the loop condition is effectively checked twice, once in hasNext, to return a boolean, and once in next, to throw a NoSuchElementException when not fulfilled. Often, the hasNext has to perform the actual traversal operation and store the result into the Iterator instance, to ensure that the result stays valid until the subsequent next call. The next operation in turn, has to check whether such a traversal did already happen or whether it has to perform the operation itself. In practice, the hot spot optimizer may or may not eliminate the overhead imposed by the Iterator design.

In contrast, the Spliterator has a single traversal method, boolean tryAdvance(Consumer<? super T> action), which performs the actual operation and returns whether there was an element. This simplifies the loop logic significantly. There’s even the void forEachRemaining(Consumer<? super T> action) for non-short-circuiting operations, which allows the actual implementation to provide the entire looping logic. E.g., in case of ArrayList the operation will end up at a simple counting loop over the indices, performing a plain array access.

You may compare such design with, e.g. readLine() of BufferedReader, which performs the operation and returns null after the last element, or find() of a regex Matcher, which performs the search, updates the matcher’s state and returns the success state.

But the impact of such design differences is hard to predict in an environment with an optimizer designed specifically to identify and eliminate redundant operations. The takeaway is that there is some potential for Stream based solutions to turn out to be even faster, while it depends on a lot of factors whether it will ever materialize in a particular scenario. As said at the beginning, it’s usually not changing the overall time complexity, which would be more important to worry about.

Java Stream works slower than for loop

Just like regexes where a specific homegrown parser can be faster, streams are there to provide a quick and concise means of processing data. One of the advantages of streams is that intermediate data structures can be minimized while processing the elements. The other is the parallel aspect that has already been mentioned in the comments.

But as to your example.

Relying on simply performance tests using internal clocks (even though I do it too) is not the best way to accurately assess performance. Use something like Java Microbench Harness to do testing.
As to your result, try it with the following:
- Change STOPS to 100_000_000
- Modify your stream to
  
  return metro.stream().parallel() .mapToInt(x -> x[0]-x[1]) .sum();

Here were the results on my Windows, quad core i7 laptop

Stops: 100000000
sum1: -7073
sum1 (for loop): 908 milliseconds.
sum2: -7073
sum1 (stream): 518 milliseconds.

How to use stream instead of for loop

findFirst() return an optional, so the exact solution is:

return cars.stream() 
         .filter(car -> carRef.equals(car.getName())) // keep only matching elements
         .findFirst() // when 1st element matches, it triggers return
         .orElse(null);   // If not element, return null

When should I use streams?

Your assumption is correct. Your stream implementation is slower than the for-loop.

This stream usage should be as fast as the for-loop though:

EXCLUDE_PATHS.stream()  
    .map(String::toLowerCase)
    .anyMatch(path::contains);

This iterates through the items, applying String::toLowerCase and the filter to the items one-by-one and terminating at the first item that matches.

Both collect() & anyMatch() are terminal operations. anyMatch() exits at the first found item, though, while collect() requires all items to be processed.

What is the advantage of IntStream over usual Stream?

Stream<Integer> etc. have to work with boxed values (Integer instead of primitive int) which takes significantly more memory and usually a lot of boxing/unboxing operations (depending on your code). Why only Int/Double/Long? Just because they were expected to be used most often.

Same applies to OptionalInt and friends and all the functional interfaces.

For collections (lists/maps/sets) there are many third-party libraries providing primitive specialization for the same reason. Really the problem there is even more acute because with streams you don't (usually; sorted() is a counter-example) need to store many values in memory.

Streams or for loops

Joshua Bloch, author of "Effective Java", has a good talk which touches on when to use streams. Start watching around 30:30 for his section on "Use streams judiciously".

Although this is largely opinion based, he argues that you do not want to immediately begin turning all of your procedural loops into streams, but you really want a balanced approach. He provides at least one example method where doing so creates code that is more difficult to understand. He also argues that there is no right answer in many cases whether to write it procedural or in a more functional manner, and it is dependent on the context (and I would argue what the team has decided to do corporately might play a role). He has the examples on GitHub, and all the examples below are from his GitHub repository.

Here is the example he provides of his iterative anagram method,

// Prints all large anagram groups in a dictionary iteratively (Page 204)
public class IterativeAnagrams {
    public static void main(String[] args) throws IOException {
        File dictionary = new File(args[0]);
        int minGroupSize = Integer.parseInt(args[1]);

        Map<String, Set<String>> groups = new HashMap<>();
        try (Scanner s = new Scanner(dictionary)) {
            while (s.hasNext()) {
                String word = s.next();
                groups.computeIfAbsent(alphabetize(word),
                        (unused) -> new TreeSet<>()).add(word);
            }
        }

        for (Set<String> group : groups.values())
            if (group.size() >= minGroupSize)
                System.out.println(group.size() + ": " + group);
    }

    private static String alphabetize(String s) {
        char[] a = s.toCharArray();
        Arrays.sort(a);
        return new String(a);
    }
}

And here it is using Streams,

// Overuse of streams - don't do this! (page 205)
public class StreamAnagrams {
    public static void main(String[] args) throws IOException {
        Path dictionary = Paths.get(args[0]);
        int minGroupSize = Integer.parseInt(args[1]);

        try (Stream<String> words = Files.lines(dictionary)) {
            words.collect(
                    groupingBy(word -> word.chars().sorted()
                            .collect(StringBuilder::new,
                                    (sb, c) -> sb.append((char) c),
                                    StringBuilder::append).toString()))
                    .values().stream()
                    .filter(group -> group.size() >= minGroupSize)
                    .map(group -> group.size() + ": " + group)
                    .forEach(System.out::println);
        }
    }
}

He argues for a balanced, third approach that uses both,

// Tasteful use of streams enhances clarity and conciseness (Page 205)
public class HybridAnagrams {
    public static void main(String[] args) throws IOException {
        Path dictionary = Paths.get(args[0]);
        int minGroupSize = Integer.parseInt(args[1]);

        try (Stream<String> words = Files.lines(dictionary)) {
            words.collect(groupingBy(word -> alphabetize(word)))
                    .values().stream()
                    .filter(group -> group.size() >= minGroupSize)
                    .forEach(g -> System.out.println(g.size() + ": " + g));
        }
    }

    private static String alphabetize(String s) {
        char[] a = s.toCharArray();
        Arrays.sort(a);
        return new String(a);
    }
}

In Java, What Are the Advantages of Streams Over Loops