What Is Difference Between Collection.Stream().Foreach() and Collection.Foreach()

What is difference between Collection.stream().forEach() and Collection.forEach()?

For simple cases such as the one illustrated, they are mostly the same. However, there are a number of subtle differences that might be significant.

One issue is with ordering. With Stream.forEach, the order is undefined. It's unlikely to occur with sequential streams, still, it's within the specification for Stream.forEach to execute in some arbitrary order. This does occur frequently in parallel streams. By contrast, Iterable.forEach is always executed in the iteration order of the Iterable, if one is specified.

Another issue is with side effects. The action specified in Stream.forEach is required to be non-interfering. (See the java.util.stream package doc.) Iterable.forEach potentially has fewer restrictions. For the collections in java.util, Iterable.forEach will generally use that collection's Iterator, most of which are designed to be fail-fast and which will throw ConcurrentModificationException if the collection is structurally modified during the iteration. However, modifications that aren't structural are allowed during iteration. For example, the ArrayList class documentation says "merely setting the value of an element is not a structural modification." Thus, the action for ArrayList.forEach is allowed to set values in the underlying ArrayList without problems.

The concurrent collections are yet again different. Instead of fail-fast, they are designed to be weakly consistent. The full definition is at that link. Briefly, though, consider ConcurrentLinkedDeque. The action passed to its forEach method is allowed to modify the underlying deque, even structurally, and ConcurrentModificationException is never thrown. However, the modification that occurs might or might not be visible in this iteration. (Hence the "weak" consistency.)

Still another difference is visible if Iterable.forEach is iterating over a synchronized collection. On such a collection, Iterable.forEach takes the collection's lock once and holds it across all the calls to the action method. The Stream.forEach call uses the collection's spliterator, which does not lock, and which relies on the prevailing rule of non-interference. The collection backing the stream could be modified during iteration, and if it is, a ConcurrentModificationException or inconsistent behavior could result.

What is the difference between Java8 container `for each` and Stream `for each`

The forEach(Consumer) method is declared in the Iterable interface which Collection, and therefore List, extends. The default implementation of forEach(Consumer) is:

default void forEach(Consumer<? super T> action) {
Objects.requireNonNull(action);
for (T t : this) {
action.accept(t);
}
}

As you can see the default implementation simply calls the action in a for-each loop. And a for-each loop is simply syntactic sugar for:

for (Iterator<?> iterator = iterable.iterator(); iterator.hasNext(); ) {
Object element = iterator.next();
// Do what you need to with element
}

Except you don't have access to the Iterator in a for-each loop.

Specific implementations of Iterable may change how it actually iterates its elements (it may or may not use an Iterator) but it will almost always come down to some for or while loop. I say "almost always" because it's possible some type of recursion or chaining may be involved.

Now, using List.stream().forEach(Consumer) creates an unnecessary Stream object when you are simply trying to iterate the List sequentially. You should only use the streaming API if you actually need to process a collection of elements in a pipeline fashion (such as mapping, filtering, mapping some more, etc...).

So, for simply iterating, using List.stream().forEach(Consumer) is going to be less performant than a simple List.forEach(Consumer) call in virtually all cases. The performance increase will most likely be negligible but it is an easy enough fix that the "optimization" is not excessive; especially if you don't make the "mistake" in the first place. Don't create objects if you don't need them.

It may be better to simply use a for-each loop instead of forEach(Consumer) though. It can be easier to read than the more functional counterpart.


Edit

As mentioned in the comments by Holger, Stream.forEach(Consumer) has a pretty major difference to Iterable.forEach(Consumer): It does not guarantee the encounter order of the elements. While the iteration order of Iterable.forEach(Consumer) is not defined for the Iterable interface either, it can be defined by extending interfaces (such as List). When using a Stream, however, the order is not guaranteed regardless of the source of the Stream.

If you want the order to be guaranteed when using a Stream you have to use Stream.forEachOrdered(Consumer).

Difference between iterable.forEach() and iterable.stream().forEach()

There are a few differences:

Iterable.forEach guarantees processing in iteration order, if it's defined for the Iterable. (Iteration order is generally well-defined for Lists.) Stream.forEach does not; one must use Stream.forEachOrdered instead.

Iterable.forEach may permit side effects on the underlying data structure. Although many collections' iterators will throw ConcurrentModificationException if the collection is modified during iteration, some collections' iterators explicitly permit it. See CopyOnWriteArrayList, for example. By contrast, stream operations in general must not interfere with the stream source.

If the Iterable is a synchronized wrapper collection, for example, from Collections.synchronizedList(), a call to forEach on it will hold its lock during the entire iteration. This will prevent other threads from modifying the collection during the iteration, ensuring that the iteration sees a consistent view of the collection, and preventing ConcurrentModificationException. (This will also prevent other threads from reading the collection during the iteration.) This is not the case for streams. There is nothing to prevent the collection from being modified during the stream operation, and if modification does occur, the result is undefined.

Pros and Cons of usage forEach and Stream

There are no advantages of using the second case, unless you have a parallel stream. There is a disadvantage, namely that Stream.forEach() doesn't guarantee to respect encounter order. A more accurate (but still unnecessary) equivalent would be Stream.forEachOrdered().

Java 8 Collection and stream/forEach

foo.forEach(); // Goes through every item in foo
foo.stream().forEach(); // Does stream make a difference here

It is useless unless you need stream operations like map or filter.

foo.parallelStream().forEach();

This spawns a new thread for every logical core of your computer to compute the items. Think twice about whether or not you use this feature, in most cases it only pays off on long running operations.

Bottom line: Streams really shine when they can be used without side-effects, like mapping collection of type A to type B, without altering A. Loops most likely will alter data outside the stream.

forEach vs forEachOrdered in Java 8 Stream

Stream.of("AAA","BBB","CCC").parallel().forEach(s->System.out.println("Output:"+s));
Stream.of("AAA","BBB","CCC").parallel().forEachOrdered(s->System.out.println("Output:"+s));

The second line will always output

Output:AAA
Output:BBB
Output:CCC

whereas the first one is not guaranted since the order is not kept. forEachOrdered will processes the elements of the stream in the order specified by its source, regardless of whether the stream is sequential or parallel.

Quoting from forEach Javadoc:

The behavior of this operation is explicitly nondeterministic. For parallel stream pipelines, this operation does not guarantee to respect the encounter order of the stream, as doing so would sacrifice the benefit of parallelism.

When the forEachOrdered Javadoc states (emphasis mine):

Performs an action for each element of this stream, in the encounter order of the stream if the stream has a defined encounter order.

Java 8 Iterable.forEach() vs foreach loop

The better practice is to use for-each. Besides violating the Keep It Simple, Stupid principle, the new-fangled forEach() has at least the following deficiencies:

  • Can't use non-final variables. So, code like the following can't be turned into a forEach lambda:
Object prev = null;
for(Object curr : list)
{
if( prev != null )
foo(prev, curr);
prev = curr;
}
  • Can't handle checked exceptions. Lambdas aren't actually forbidden from throwing checked exceptions, but common functional interfaces like Consumer don't declare any. Therefore, any code that throws checked exceptions must wrap them in try-catch or Throwables.propagate(). But even if you do that, it's not always clear what happens to the thrown exception. It could get swallowed somewhere in the guts of forEach()

  • Limited flow-control. A return in a lambda equals a continue in a for-each, but there is no equivalent to a break. It's also difficult to do things like return values, short circuit, or set flags (which would have alleviated things a bit, if it wasn't a violation of the no non-final variables rule). "This is not just an optimization, but critical when you consider that some sequences (like reading the lines in a file) may have side-effects, or you may have an infinite sequence."

  • Might execute in parallel, which is a horrible, horrible thing for all but the 0.1% of your code that needs to be optimized. Any parallel code has to be thought through (even if it doesn't use locks, volatiles, and other particularly nasty aspects of traditional multi-threaded execution). Any bug will be tough to find.

  • Might hurt performance, because the JIT can't optimize forEach()+lambda to the same extent as plain loops, especially now that lambdas are new. By "optimization" I do not mean the overhead of calling lambdas (which is small), but to the sophisticated analysis and transformation that the modern JIT compiler performs on running code.

  • If you do need parallelism, it is probably much faster and not much more difficult to use an ExecutorService. Streams are both automagical (read: don't know much about your problem) and use a specialized (read: inefficient for the general case) parallelization strategy (fork-join recursive decomposition).

  • Makes debugging more confusing, because of the nested call hierarchy and, god forbid, parallel execution. The debugger may have issues displaying variables from the surrounding code, and things like step-through may not work as expected.

  • Streams in general are more difficult to code, read, and debug. Actually, this is true of complex "fluent" APIs in general. The combination of complex single statements, heavy use of generics, and lack of intermediate variables conspire to produce confusing error messages and frustrate debugging. Instead of "this method doesn't have an overload for type X" you get an error message closer to "somewhere you messed up the types, but we don't know where or how." Similarly, you can't step through and examine things in a debugger as easily as when the code is broken into multiple statements, and intermediate values are saved to variables. Finally, reading the code and understanding the types and behavior at each stage of execution may be non-trivial.

  • Sticks out like a sore thumb. The Java language already has the for-each statement. Why replace it with a function call? Why encourage hiding side-effects somewhere in expressions? Why encourage unwieldy one-liners? Mixing regular for-each and new forEach willy-nilly is bad style. Code should speak in idioms (patterns that are quick to comprehend due to their repetition), and the fewer idioms are used the clearer the code is and less time is spent deciding which idiom to use (a big time-drain for perfectionists like myself!).

As you can see, I'm not a big fan of the forEach() except in cases when it makes sense.

Particularly offensive to me is the fact that Stream does not implement Iterable (despite actually having method iterator) and cannot be used in a for-each, only with a forEach(). I recommend casting Streams into Iterables with (Iterable<T>)stream::iterator. A better alternative is to use StreamEx which fixes a number of Stream API problems, including implementing Iterable.

That said, forEach() is useful for the following:

  • Atomically iterating over a synchronized list. Prior to this, a list generated with Collections.synchronizedList() was atomic with respect to things like get or set, but was not thread-safe when iterating.

  • Parallel execution (using an appropriate parallel stream). This saves you a few lines of code vs using an ExecutorService, if your problem matches the performance assumptions built into Streams and Spliterators.

  • Specific containers which, like the synchronized list, benefit from being in control of iteration (although this is largely theoretical unless people can bring up more examples)

  • Calling a single function more cleanly by using forEach() and a method reference argument (ie, list.forEach (obj::someMethod)). However, keep in mind the points on checked exceptions, more difficult debugging, and reducing the number of idioms you use when writing code.

Articles I used for reference:

  • Everything about Java 8
  • Iteration Inside and Out (as pointed out by another poster)

EDIT: Looks like some of the original proposals for lambdas (such as http://www.javac.info/closures-v06a.html Google Cache) solved some of the issues I mentioned (while adding their own complications, of course).

Java 8. Difference between collection.stream() and Stream.of(collection)

Stream does not have a Stream.of(Collection) method. It does have a method

static <T> Stream<T> of(T t)

If you pass a Collection to this method you'll get a Stream<Collection> containing one element (the Collection), not a stream of the collection's elements.

As an example, try this:

List<Integer> l1 = Arrays.asList(1, 2, 3);
List<Integer> l2 = Arrays.asList(4, 5, 6);
Stream.of(l1, l2).flatMap((x)->x.stream()).forEach((x)->System.out.println(x));
Stream.of(l1, l2).flatMap((x)->Stream.of(x)).forEach((x)->System.out.println(x));

The first version prints:

1
2
3
4
5
6

The second version prints:

[1, 2, 3]
[4, 5, 6]

Note that if arr is an Object[] you can do Stream.of(arr) to get a stream of the array's elements. This is because there is another version of of that uses varargs.

static <T> Stream<T> of(T... values)


Related Topics



Leave a reply



Submit