20

I was curious about performance of creation of java8 lambda instances against the same anonymous class. (Measurement performed on win32 java build 1.8.0-ea-b106). I've created very simple example and measured if java propose some optimization of new operator while create lambda expression:

static final int MEASURES = 1000000;
static interface ICallback{
    void payload(int[] a);
}
/**
* force creation of anonymous class many times
*/
static void measureAnonymousClass(){
    final int arr[] = {0};
    for(int i = 0; i < MEASURES; ++i){
        ICallback clb = new ICallback() {
            @Override
            public void payload(int[] a) {
                a[0]++;
            }
        };
        clb.payload(arr);
    }
}
/**
* force creation of lambda many times 
*/
static void measureLambda(){ 
    final int arr[] = {0};
    for(int i = 0; i < MEASURES; ++i){
        ICallback clb = (a2) -> {
            a2[0]++;
        };
        clb.payload(arr);
    }
}

(Full code can be taken there: http://codepad.org/Iw0mkXhD) The result is rather predictable - lambda wins 2 times.

But really little shift to make closure shows very bad time for lambda. Anonymous class wins 10 times! So now anonymous class looks like:

ICallback clb = new ICallback() {
        @Override
        public void payload() {
            arr[0]++;
        }
    };

And lambda does as follow:

ICallback clb = () -> {
            arr[0]++;
        };

(Full code can be taken there: http://codepad.org/XYd9Umty ) Can anybody explain me why exists so big (bad) difference in handling of closure?

12
  • That's a quite naïve approach to microbenchmarking. At the very least use System.nanoTime and introduce throwaway executions to warm up the JVM. Several System.gc() calls between executions would alse be a good idea. Ideally, do this with Google Caliper or Oracle jmh. Commented Sep 25, 2013 at 9:39
  • @MarkoTopolnik - actually I've foreseen this note, that is why I performed 2 measurements when measureLambda run first and when measureLambda runs after measureAnonymousClass - without any impact at all! And nanoTime can show difference in precise measurement, but not when I'm talking about 10 times Commented Sep 25, 2013 at 9:41
  • 1
    The accuracy of currentTimeMillis is often at the level of a tenth of a second (platform-dependent). The accuracy of nanoTime is typically on the level of a microsecond. Also, just reordering executions doesn't prove anything: each code path must be warmed up on its own. Warm-up executions is the way to do it and garbage collection must be controlled for. Commented Sep 25, 2013 at 9:53
  • 4
    Maybe you are missing the point of my comments so far: it is falsifying a number of standard hypotheses about the common sources of error while benchmarking on the JVM. Only when you have those solidly cleared can you enter a serious discussion of the results. Commented Sep 25, 2013 at 12:03
  • 1
    Note that, besides the fact that this benchmark is far away from the intended use case, just specifying the -server option at JVM start will make the recorded overhead go away entirely. Commented Nov 21, 2013 at 14:40

1 Answer 1

30

UPDATE

A few comments wondering if my benchmark at the bottom was flawed - after introducing a lot of randomness (to prevent the JIT from optimising too much stuff), I still get similar results so I tend to think it is ok.

In the meantime, I have come across this presentation by the lambda implementation team. Page 16 shows some performance figures: inner classes and closures have similar performance / non-capturing lambda are up to 5x times faster.

And @StuartMarks posted this JVMLS 2013 talk from Sergey Kuksenko on lambda performance. The bottom line is that post JIT compilation, lambdas and anonymous classes perform similarly on current Hostpot JVM implementations.


YOUR BENCHMARK

I have also run your test, as you posted it. The problem is that it runs for as little as 20 ms for the first method and 2 ms for the second. Although that is a 10:1 ratio, it is in no way representative because the measurement time is way too small.

I have then taken modified your test to allow for more JIT warmup and I get similar results as with jmh (i.e. no difference between anonymous class and lambda).

public class Main {

    static interface ICallback {
        void payload();
    }
    static void measureAnonymousClass() {
        final int arr[] = {0};
        ICallback clb = new ICallback() {
            @Override
            public void payload() {
                arr[0]++;
            }
        };
        clb.payload();
    }
    static void measureLambda() {
        final int arr[] = {0};
        ICallback clb = () -> {
            arr[0]++;
        };
        clb.payload();
    }
    static void runTimed(String message, Runnable act) {
        long start = System.nanoTime();
        for (int i = 0; i < 10_000_000; i++) {
            act.run();
        }
        long end = System.nanoTime();
        System.out.println(message + ":" + (end - start));
    }
    public static void main(String[] args) {
        runTimed("as lambdas", Main::measureLambda);
        runTimed("anonymous class", Main::measureAnonymousClass);
        runTimed("as lambdas", Main::measureLambda);
        runTimed("anonymous class", Main::measureAnonymousClass);
        runTimed("as lambdas", Main::measureLambda);
        runTimed("anonymous class", Main::measureAnonymousClass);
        runTimed("as lambdas", Main::measureLambda);
        runTimed("anonymous class", Main::measureAnonymousClass);
    }
}

The last run takes about 28 seconds for both methods.


JMH MICRO BENCHMARK

I have run the same test with jmh and the bottom line is that the four methods take as much time as the equivalent:

void baseline() {
    arr[0]++;
}

In other words, the JIT inlines both the anonymous class and the lambda and they take exactly the same time.

Results summary:

Benchmark                Mean    Mean error    Units
empty_method             1.104        0.043  nsec/op
baseline                 2.105        0.038  nsec/op
anonymousWithArgs        2.107        0.028  nsec/op
anonymousWithoutArgs     2.120        0.044  nsec/op
lambdaWithArgs           2.116        0.027  nsec/op
lambdaWithoutArgs        2.103        0.017  nsec/op
Sign up to request clarification or add additional context in comments.

25 Comments

That result would imply that the JIT completely eliminated the allocation of the actual lambda/anonymous class instances. However, if OP is having different results, then I'd proceed as described: separate out the allocation from invocation, see if the discrepancy is still there.
Anyway, I'm pretty sure that OP's large discrepancy is due to one code path having the advantage of EA and the other going through the full dynamic allocation. I don't see anything else which would explain a factor of 10 and more.
Thanks for writing down your performance investigations. +1. The original (mostly prototype) JDK 8 implementation of lambda was exactly an anonymous inner class, to get something working early so that we could explore language and library evolution. This seems to have spawned a myth that lambdas are nothing more than anonymous inner classes. More recently the implementation has been optimized so that lambda is almost always faster than the "equivalent" anonymous inner class.
Also, there are two great JVM Language Summit talks from the performance guys. First, Alexey Shipilev (author of jmh) talks about benchmarking and its many pitfalls. (This was voted best talk at JVMLS this year.) Second, Sergey Kuksenko talks about what he's been doing to optimize lambda performance. [1] medianetwork.oracle.com/video/player/2630310904001 [2] medianetwork.oracle.com/video/player/2623576348001
@Tuntable Links updated. Thanks for mentioning this.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.