235

I'm reading up about Java streams and discovering new things as I go along. One of the new things I found was the peek() function. Almost everything I've read on peek says it should be used to debug your Streams.

What if I had a Stream where each Account has a username, password field and a login() and loggedIn() method.

I also have

Consumer<Account> login = account -> account.login();

and

Predicate<Account> loggedIn = account -> account.loggedIn();

Why would this be so bad?

List<Account> accounts; //assume it's been setup
List<Account> loggedInAccount = 
accounts.stream()
    .peek(login)
    .filter(loggedIn)
    .collect(Collectors.toList());

Now as far as I can tell this does exactly what it's intended to do. It;

  • Takes a list of accounts
  • Tries to log in to each account
  • Filters out any account which aren't logged in
  • Collects the logged in accounts into a new list

What is the downside of doing something like this? Any reason I shouldn't proceed? Lastly, if not this solution then what?

The original version of this used the .filter() method as follows;

.filter(account -> {
        account.login();
        return account.loggedIn();
    })
11
  • 72
    Any time I find myself needing a multi-line lambda, I move the lines to a private method and pass the method reference instead of the lambda. Commented Nov 10, 2015 at 17:18
  • 13
    Also note that your code could just be .peek(Account::login) and .filter(Account::loggedIn); there's no reason to write a Consumer and Predicate that just calls another method like that. Commented Nov 10, 2015 at 21:32
  • 2
    Also note that the stream API explicitly discourages side-effects in behavioural parameters. Commented Nov 10, 2015 at 22:27
  • 8
    Useful consumers always have side-effects, those are not discouraged of course. This is actually mentioned in the same section: “A small number of stream operations, such as forEach() and peek(), can operate only via side-effects; these should be used with care.”. My remark was more to remind that the peek operation (which is designed for debugging purposes) should not be replaced by doing the same thing inside another operation like map() or filter(). Commented Nov 10, 2015 at 23:53
  • 2
    A kitchen knife can kill a person, but you don't go to a war with that! :). Java 8 streams are cool. But, try not to achieve everything using streams when regular loops can do better. A forEach would be more intuitive here. We are so obsessed with java 8 streams and try to solve everything using them. I observe lot of people(including me) try to achieve something like the above using peek() thought the API clearly mentions not to use it. Commented Nov 20, 2020 at 16:52

11 Answers 11

178

The important thing you have to understand is that streams are driven by the terminal operation. The terminal operation determines whether all elements have to be processed or any at all. So collect is an operation that processes each item, whereas findAny may stop processing items once it encountered a matching element.

And count() may not process any elements at all when it can determine the size of the stream without processing the items. Since this is an optimization not made in Java 8, but which will be in Java 9, there might be surprises when you switch to Java 9 and have code relying on count() processing all items. This is also connected to other implementation-dependent details, e.g. even in Java 9, the reference implementation will not be able to predict the size of an infinite stream source combined with limit while there is no fundamental limitation preventing such prediction.

Since peek allows “performing the provided action on each element as elements are consumed from the resulting stream”, it does not mandate processing of elements but will perform the action depending on what the terminal operation needs. This implies that you have to use it with great care if you need a particular processing, e.g. want to apply an action on all elements. It works if the terminal operation is guaranteed to process all items, but even then, you must be sure that not the next developer changes the terminal operation (or you forget that subtle aspect).

Further, while streams guarantee to maintain the encounter order for a certain combination of operations even for parallel streams, these guarantees do not apply to peek. When collecting into a list, the resulting list will have the right order for ordered parallel streams, but the peek action may get invoked in an arbitrary order and concurrently.

So the most useful thing you can do with peek is to find out whether a stream element has been processed which is exactly what the API documentation says:

This method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline

Sign up to request clarification or add additional context in comments.

4 Comments

will there be any problem, future or present, in OP's use case? Does his code always do what he wants?
@bayou.io: as far as I can see, there is no problem in this exact form. But as I tried to explain, using it this way implies you have to remember this aspect, even if you come back to the code one or two years later to incorporate «feature request 9876» into the code…
"the peek action may get invoked in an arbitrary order and concurrently". Doesn't this statement go against their rule for how peek works, e.g. "as elements are consumed"?
@Jose Martinez: It says “as elements are consumed from the resulting stream”, which isn’t the terminal action but the processing, though even the terminal action could consume elements out of order as long as the final result is consistent. But I also think, the phrase of the API note, “see the elements as they flow past a certain point in a pipeline” does a better job at describing it.
121

The key takeaway from this:

Don't use the API in an unintended way, even if it accomplishes your immediate goal. That approach may break in the future, and it is also unclear to future maintainers.


There is no harm in breaking this out to multiple operations, as they are distinct operations. There is harm in using the API in an unclear and unintended way, which may have ramifications if this particular behavior is modified in future versions of Java.

Using forEach on this operation would make it clear to the maintainer that there is an intended side effect on each element of accounts, and that you are performing some operation that can mutate it.

It's also more conventional in the sense that peek is an intermediate operation which doesn't operate on the entire collection until the terminal operation runs, but forEach is indeed a terminal operation. This way, you can make strong arguments around the behavior and the flow of your code as opposed to asking questions about if peek would behave the same as forEach does in this context.

accounts.forEach(a -> a.login());
List<Account> loggedInAccounts = accounts.stream()
                                         .filter(Account::loggedIn)
                                         .collect(Collectors.toList());

16 Comments

If you perform the login in a preprocessing step, you don’t need a stream at all. You can perform forEach right at the source collection: accounts.forEach(a -> a.login());
@Holger: Excellent point. I've incorporated that into the answer.
@Adam.J: Right, my answer focused more on the general question contained in your title, i.e. is this method really only for debugging, by explaining the aspects of that method. This answer is more fused on your actual use case and how to do it instead. So you could say, together they provide the full picture. First, the reason why this is not the intended use, second the conclusion, not to stick to an unintended use and what to do instead. The latter will have more practical use for you.
Of course, it was much easier if the login() method returned a boolean value indicating the success status…
That’s what I was aiming at. If login() returns a boolean, you can use it as a predicate which is the cleanest solution. It still has a side effect, but that’s ok as long as it is non-interfering, i.e. the login process` of one Account has no influence on the login` process` of another Account.
|
35

Perhaps a rule of thumb should be that if you do use peek outside the "debug" scenario, you should only do so if you're sure of what the terminating and intermediate filtering conditions are. For example:

return list.stream().map(foo->foo.getBar())
                    .peek(bar->bar.publish("HELLO"))
                    .collect(Collectors.toList());

seems to be a valid case where you want, in one operation to transform all Foos to Bars and tell them all hello.

Seems more efficient and elegant than something like:

List<Bar> bars = list.stream().map(foo->foo.getBar()).collect(Collectors.toList());
bars.forEach(bar->bar.publish("HELLO"));
return bars;

and you don't end up iterating a collection twice.

2 Comments

Iterating twice is O(2n) =~ O(n). The chances of you having a performance problem because of this are minimal. You do however score on clarity if you don't use peek.
In fact, it’s possible that the two iterations perform better than the single stream operation with two different purposes. Predicting performance in an environment with a runtime optimizer is very hard.
12

A lot of answers made good points, and especially the (accepted) answer by Makoto describes the possible problems in quite some detail. But no one actually showed how it can go wrong:

[1]-> IntStream.range(1, 10).peek(System.out::println).count();
|  $6 ==> 9

No output.

[2]-> IntStream.range(1, 10).filter(i -> i%2==0).peek(System.out::println).count();
|  $9 ==> 4

Outputs numbers 2, 4, 6, 8.

[3]-> IntStream.range(1, 10).filter(i -> i > 0).peek(System.out::println).count();
|  $12 ==> 9

Outputs numbers 1 to 9.

[4]-> IntStream.range(1, 10).map(i -> i * 2).peek(System.out::println).count();
|  $16 ==> 9

No output.

[5]-> Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9).peek(System.out::println).count();
|  $23 ==> 9

No output.

[6]-> Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9).stream().peek(System.out::println).count();
|  $25 ==> 9

No output.

[7]-> IntStream.range(1, 10).filter(i -> true).peek(System.out::println).count();
|  $30 ==> 9

Outputs numbers 1 to 9.

[1]-> List<Integer> list = new ArrayList<>();
|  list ==> []
[2]-> Stream.of(1, 5, 2, 7, 3, 9, 8, 4, 6).sorted().peek(list::add).count();
|  $7 ==> 9
[3]-> list
|  list ==> []

(You get the idea.)

The examples were run in jshell (Java 15.0.2) and mimic the use case of converting data (replace System.out::println by list::add for example as also done in some answers) and returning how much data was added. The current observation is that any operation that could filter elements (such as filter or skip) seems to force handling of all remaining elements, but it need not stay that way.

5 Comments

I'm not sure your results are reliable. As the .count terminal operation also produces output, JShell might be overwriting the output of the .peek operation with it. If you replace .count with another terminal operation that doesn't produce output, it works nicely, e.g. jshell> IntStream.range(1,10).peek(System.out::println).forEach(i->{}).
Having count as the terminal operation here is exactly the problem I want to show. count is not interested in your actual elements which is why they are sometimes not processed and the count just calculated.
Ah, ok, now I see.
Just for anyone wondering how does the count() method works without actually calculating number of elements in the stream, I strongly believe it is because of flag StreamOpFlag.SIZED being set by IntStream and Stream.of. What is worse, Stream.of behaviour differs between JVM versions: in 1.8 it used to be a normal stream, but in some latter version it became SIZED, iirc.
@Xobotun as said in other (much older) answers, count() was changed in Java 9, not Stream.of()’s behavior
11

Despite the documentation note for .peek saying the "method exists mainly to support debugging" I think it has general relevance. For one thing the documentation says "mainly", so leaves room for other use cases. It is not deprecated since years, and speculations about its removal are futile IMO.

I would say in a world where we still have to handle side-effectful methods it has a valid place and utility. There are many valid operations in streams that use side-effects. Many have been mentioned in other answers, I'll just add here to set a flag on a collection of objects, or register them with a registry, on objects which are then further processed in the stream. Not to mention creating log messages during stream processing.

I support the idea to have separate actions in separate stream operations, so I avoid pushing everything into a final .forEach. I favor .peek over an equivalent .map with a lambda who's only purpose, besides calling the side-effect method, is to return the passed in argument. .peek tells me that what goes in also goes out as soon as I encounter this operation, and I don't need to read a lambda to find out. In that sense it is succinct, expressive and improves readability of the code.

Having said that I agree with all the considerations when using .peek, e.g. being aware of the effect of the terminal operation of the stream it is used in.

1 Comment

This is an excellent answer! Tysvm for detailing your thoughts.
9

I would say that peek provides the ability to decentralize code that can mutate stream objects, or modify global state (based on them), instead of stuffing everything into a simple or composed function passed to a terminal method.

Now the question might be: should we mutate stream objects or change global state from within functions in functional style java programming?

If the answer to any of the the above 2 questions is yes (or: in some cases yes) then peek() is definitely not only for debugging purposes, for the same reason that forEach() isn't only for debugging purposes.

For me when choosing between forEach() and peek(), is choosing the following: Do I want pieces of code that mutate stream objects (or change global state) to be attached to a composable, or do I want them to attach directly to stream?

I think peek() will better pair with java9 methods. e.g. takeWhile() may need to decide when to stop iteration based on an already mutated object, so paring it with forEach() would not have the same effect.

P.S. I have not mentioned map() anywhere because in case we want to mutate objects (or global state), rather than generating new objects, it works exactly like peek().

1 Comment

Not true at all. According to its JavaDocs, the intermediate Stream operation java.util.Stream.peek() “exists mainly to support debugging” purposes.
7

Although I agree with most answers above, I have one case in which using peek actually seems like the cleanest way to go.

Similar to your use case, suppose you want to filter only on active accounts and then perform a login on these accounts.

accounts.stream()
    .filter(Account::isActive)
    .peek(login)
    .collect(Collectors.toList());

Peek is helpful to avoid the redundant call while not having to iterate the collection twice:

accounts.stream()
    .filter(Account::isActive)
    .map(account -> {
        account.login();
        return account;
    })
    .collect(Collectors.toList());

3 Comments

All you have to do is to get that login method right. I really don't see how the peek is the cleanest way to go. How should someone who is reading your code know that you actually misusing the API. Good and clean code doesn't force a reader to make assumptions about the code.
I think you need to qualify the method reference in the .peek operation, e.g. as Account::login, for it to work.
I agree that using .peek in favor of the .map alternative is more succinct, expressive and understandable. The lambda in the .map is only necessary to return the passed in object. .peek does this on its own. I know that as soon as I read the operation name, and don't have to inspect the lambda to find it out.
4

The functional solution is to make account object immutable. So account.login() must return a new account object. This will mean that the map operation can be used for login instead of peek.

2 Comments

Presumably, the side-effect of logging in is more than just mutating the object. I don't think you can reasonably refactor 'login' into something pure.
login would typically just return a token - no side effect
2

To get rid of warnings, I use functor tee, named after Unix' tee:

public static <T> Function<T,T> tee(Consumer<T> after) {
    return arg -> {
        f.accept(arg);
        return arg;
    };
}

You can replace:

  .peek(f)

with

  .map(tee(f))

1 Comment

This is less readable than just peek or map with return arg. Before you circumvent warnings like that, supress them instead.
0

It seems like a helper class is needed:

public static class OneBranchOnly<T> {
    public Function<T, T> apply(Predicate<? super T> test,
                                Consumer<? super T> t) {
        return o -> {
            if (test.test(o)) t.accept(o);
            return o;
        };
    }
}

then switch peek with map:

.map(new OneBranchOnly< Account >().apply(                   
                    account -> account.isTestAccount(),
                    account -> account.setName("Test Account"))
)

results: Collections of accounts that only test accounts got renamed (no reference gets maintained)

Comments

0

Why peek is bad fit here:

As you already pointed out, peek can be used for debugging. The action done by peek should be non-interfering and stateless. Logging is fine; mutating state (logging in) is not great style.

According to Oracle's Javadoc, in elaboration about the method peek:

For parallel stream pipelines, the action may be called at whatever time and in whatever thread the element is made available by the upstream operation. If the action modifies shared state, it is responsible for providing the required synchronization.

So peek may bring synchronizing issue. If login() touches shared state or isn’t thread-safe, you can get races.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.