In Java Streams Is Peek Really Only For Debugging

In Java streams is peek really only for debugging?

The key takeaway from this:

Don't use the API in an unintended way, even if it accomplishes your immediate goal. That approach may break in the future, and it is also unclear to future maintainers.


There is no harm in breaking this out to multiple operations, as they are distinct operations. There is harm in using the API in an unclear and unintended way, which may have ramifications if this particular behavior is modified in future versions of Java.

Using forEach on this operation would make it clear to the maintainer that there is an intended side effect on each element of accounts, and that you are performing some operation that can mutate it.

It's also more conventional in the sense that peek is an intermediate operation which doesn't operate on the entire collection until the terminal operation runs, but forEach is indeed a terminal operation. This way, you can make strong arguments around the behavior and the flow of your code as opposed to asking questions about if peek would behave the same as forEach does in this context.

accounts.forEach(a -> a.login());
List<Account> loggedInAccounts = accounts.stream()
.filter(Account::loggedIn)
.collect(Collectors.toList());

In java streams using .peek() is regarded as to be used for debugging purposes only, would logging be considered as debugging?

The documentation of peek describes the intent as

This method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline.

An expression of the form .peek(classInSchool -> log.debug("Processing classroom {} in sixth grade without classroom.", classInSchool) fulfills this intend, as it is about reporting the processing of an element. It doesn’t matter whether you use the logging framework or just print statements, as in the documentation’s example, .peek(e -> System.out.println("Filtered value: " + e)). In either case, the intent matters, not the technical approach. If someone used peek with the intent to print all elements, it would be wrong, even if it used the same technical approach as the documentation’s example (System.out.println).

The documentation doesn’t mandate that you have to distinguish between production environment or debugging environment, to remove the peek usage for the former. Actually, your use would even fulfill that, as the logging framework allows you to mute that action via the configurable logging level.

I would still suggest to keep in mind that for some pipelines, inserting a peek operation may insert more overhead than the actual operation (or hinder the JVM’s loop optimizations to such degree). But if you do not experience performance problems, you may follow the old advice to not try to optimize unless you have a real reason…

Better way than Stream.peek()

You are overusing method references. The simplicity of Test::new is not worth anything, if it complicates the rest of your stream usage.

A clear solution would be:

Stream.of("Karl", "Jill", "Jack")
.map(first -> { Test t = new Test(first); t.setLastName("Doe"); return t; })

or much better

Stream.of("Karl", "Jill", "Jack").map(first -> new Test(first, "Doe")) …

assuming that the class has the not-so-far-fetched constructor accepting both names.

The code above addresses the use case, where the action manipulates a locally constructed object, so the action is only relevant if the object will be consumed by the subsequent Stream operations. For other cases, where the action has a side effect on objects outside the Stream, abusing map has almost all of the drawbacks of peek explained in “In Java streams, is peek really only for debugging?”

The use of .peek() in java 8 streams

the documentation of Stream#peek has mentioned as below, and mainly is not absolutely:

This method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline

@Holger has answered this question absolutely:

the most useful thing you can do with peek is to find out whether a stream element has been processed.

and some side-effects he has also pointed it out in his answer, the peek operation depends on which terminal operation was invoked. so when using peek internally you should be carefully.

so the correctly way is just using for-each loop, since Stream#collect doesn't support short-circuiting operation.

the optional way is using peek because you can control the stream by yourself. and you need to remove the synchornized block, it is unnecessary here.

return IntStream.range(0, size).peek(foo::add).allMatch(__ -> Foo.isLegal(foo));

Stream.peek() method in Java 8 vs Java 9

I assume you are running this under Java 9? You are not altering the SIZED property of the stream, so there is no need to execute either map or peek at all.

In other words all you care is about count as the final result, but in the meanwhile you do not alter the initial size of the List in any way (via filter for example or distinct) This is an optimization done in the Streams.

Btw, even if you add a dummy filter this will show what you expect:

values.stream ()
.map(n -> n*2)
.peek(System.out::print)
.filter(x -> true)
.count();

Does Java Stream 'peek' happen just before or just after the terminal operation?

Think about it. If you have a streaming pipeline like

...peek(xxx).map(...).peek(yyy).collect(...)

the xxx code would see the element before the mapping function is applied, and the yyy code would see the element after the mapping function is applied, and before the element is handed off to the final operation, as the element "flows" down the stream.

A particular element will be processed by the steps of the stream in the order of the stream pipeline, but there is generally no constraint on the order of different elements being processed, especially if stream is parallel.

E.g. the stream engine could decide to send all elements to the xxx code, before mapping all the element, and then finally send all elements to the yyy code. The stream pipeline would function correctly if it did. It won't, or course, because that defeats the purpose of streaming, but it would be valid to do so.

Peek() really to see the elements as they flow past a certain point in a pipeline

No. Streams may be evaluated lazily as needed, and the order of operations is not strongly defined, especially when you're peek()ing. This allows the streams API to support very large streams without significant waste of time and memory, as well as allowing certain implementation simplifications. In particular, a single stage of the pipeline need not be fully evaluated before the next stage is.

Suppose how wasteful the following code would be, given your assumptions:

IntStream.range(1, 1000000).skip(5).limit(10).forEach(System::println);

The stream starts with one million elements and ends up with 10. If we evaluated each stage fully, our intermediate would be 1 million, 999995, and 10 elements, respectively.

As a second example, the following stream cannot be evaluated a stage at a time (because IntStream.generate returns an infinite stream):

IntStream.generate(/* some supplier */).limit(10).collect(Collectors.toList());

Your pipeline does indeed pass every single element through the first peek, and then only a subset through the second peek. However, the pipeline performs this evaluation in an element-major rather than stage-major order: it evaluates the pipe for 1, dropping it at the filter, then 2. Once it evaluates the pipe for 3, it passes the filter thus both peek statement execute, and the same then occurs for 4 and 5.

Possible side effect of Stream.peek changing state and why not use it like this

What you seem to do looks harmless as Brian Goetz states in comment here.

Now the problem with peek is that if you do side effects inside it - you would expect these side effects to actually happen. So, suppose you would want to alter some property of some object like this:

myUserList.stream()
.peek(u -> u.upperCaseName())
.count()

In java-8 your peek would be indeed called, in 9 - it is not - there is no need to call peek here since the size can be computed without it anyway.

While being on the same path, imagine that your terminal operation is a short-circuit one, like findFirst or findAny - you are not going to process all elements of the source - you might get just a few of them through the pipeline.

Things might get even stranger if your peek would rely on a encounter order even if your terminal operation would not be a short-circuit one. The intermediate operations for parallel processing do not have an encounter order, while the terminal ones - do. Imagine the surprises you might be facing.



Related Topics



Leave a reply



Submit