Why is there performance variation using default constructor "{}" instead of "= default"?

Question

I recently noticed that I was having a performance hit because I was declaring a default constructor like:

Foo() = default;

instead of

Foo() {}

(Just FYI, I needed to explicitly declare it because I also had a variadic constructor that would otherwise override the default constructor)

This seemed strange to me because I thought that these two lines of code are identical (Well, so long as a default constructor is possible. If the default constructor isn't possible, the second line of code would produce an error and the first would implicitly delete the default constructor. 'Not my situation!).

Okay, so I made a little tester and the results vary quite a lot depending on the compiler, but with certain settings I get consistent results that one is faster over the other:

#include <chrono>

template <typename T>
double TimeDefaultConstructor (int n_iterations)
{
    auto start_time = std::chrono::system_clock::now();

    for (int i = 0; i < n_iterations; ++i)
        T t;

    auto end_time = std::chrono::system_clock::now();

    std::chrono::duration<double> elapsed_seconds = end_time - start_time;

    return elapsed_seconds.count();
}

template <typename T, typename S>
double CompareDefaultConstructors (int n_comparisons, int n_iterations)
{
    int n_comparisons_with_T_faster = 0;

    for (int i = 0; i < n_comparisons; ++i)
    {
        double time_for_T = TimeDefaultConstructor<T>(n_iterations);
        double time_for_S = TimeDefaultConstructor<S>(n_iterations);

        if (time_for_T < time_for_S)    
            ++n_comparisons_with_T_faster;  
    }

    return (double) n_comparisons_with_T_faster / n_comparisons;
}


#include <vector>

template <typename T>
struct Foo
{
    std::vector<T> data_;

    Foo() = default;
};

template <typename T>
struct Bar
{
    std::vector<T> data_;

    Bar() {};
};

#include <iostream>

int main ()
{
    int n_comparisons = 10000;
    int n_iterations = 10000;

    typedef int T;

    double result = CompareDefaultConstructors<Foo<T>,Bar<T>> (n_comparisons, n_iterations);

    std::cout << "With " << n_comparisons << " comparisons of " << n_iterations
        << " iterations of the default constructor, Foo<" << typeid(T).name() << "> was faster than Bar<" << typeid(T).name() << "> "
        << result*100 << "% of the time" << std::endl;

    std::cout << "swapping orientation:" << std::endl;

    result = CompareDefaultConstructors<Bar<T>,Foo<T>> (n_comparisons, n_iterations);

    std::cout << "With " << n_comparisons << " comparisons of " << n_iterations
        << " iterations of the default constructor, Bar<" << typeid(T).name() << "> was faster than Foo<" << typeid(T).name() << "> "
        << result*100 << "% of the time" << std::endl;

    return 0;
}

Using the above program with g++ -std=c++11 I consistently get output similar to:

With 10000 comparisons of 10000 iterations of the default constructor, Foo was faster than Bar 4.69% of the time swapping orientation: With 10000 comparisons of 10000 iterations of the default constructor, Bar was faster than Foo 96.23% of the time

Changing the compiler settings seems to change the result, sometimes flipping it entirely. But what I can't understand is why it matters at all?

@NicolBolas, I'm not interested in how accurate the timings are. I'm interested in the fact that Foo<T> can have consistently better performance than Bar<T> (or vise-versa). The clock is good enough to show that. — Elliott
– Elliott, Commented Dec 10, 2019 at 6:46
Did you test an optimized build? If not, your results are pointless. — Jesper Juhl
– Jesper Juhl, Commented Dec 10, 2019 at 6:52
Unoptimised compilation is not designed for performance. Measuring performance of unoptimised code is therefore a form of useless entertainment. — n. m. could be an AI
– n. m. could be an AI, Commented Dec 10, 2019 at 7:53
@n.'pronouns'm. I think that you don't understand what I was trying to do. I thought that the two different ways of declaring the default constructor were identical in C++, but I was seeing performance differences even without optimisers. At this stage I wasn't really interested in performance itself, but that the performance difference was demonstrating to me that the two default constructors didn't appear to be equivalent. — Elliott
– Elliott, Commented Dec 10, 2019 at 7:56

Evg · Accepted Answer · 2019-12-10 07:48:31Z

7

This benchmark doesn't measure what it is supposed to measure. Replace Bar() {}; with Bar() = default; making Foo and Bar identical, and you'll get the same result:

With 10000 comparisons of 10000 iterations of the default constructor, Foo was faster than Bar 69.89% of the time swapping orientation: With 10000 comparisons of 10000 iterations of the default constructor, Bar was faster than Foo 29.9% of the time

This is a vivid demonstration that you're measuring not constructors but something else.

When you enable -O1 optimization, the for loop with T t; degenerates into¹:

        test    ebx, ebx
        jle     .L3
        mov     eax, 0
.L4:
        add     eax, 1
        cmp     ebx, eax
        jne     .L4
.L3:

for both Foo and Bar. That is, into a trivial for (int i = 0; i < n_iterations; ++i); loop.

When you enable -O2 or -O3 it gets optimized out completely.

Without optimization (-O0) you get the following assembly:

        mov     DWORD PTR [rbp-4], 0
.L35:
        mov     eax, DWORD PTR [rbp-4]
        cmp     eax, DWORD PTR [rbp-68]
        jge     .L34
        lea     rax, [rbp-64]
        mov     rdi, rax
        call    Foo<int>::Foo()
        lea     rax, [rbp-64]
        mov     rdi, rax
        call    Foo<int>::~Foo()
        add     DWORD PTR [rbp-4], 1
        jmp     .L35
.L34:

and the same for Bar with Foo replaced with Bar.

Now let's take a look at the constructors:

Foo<int>::Foo()
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     QWORD PTR [rbp-8], rdi
        mov     rax, QWORD PTR [rbp-8]
        mov     rdi, rax
        call    std::vector<int, std::allocator<int> >::vector()
        nop
        leave
        ret

and

Bar<int>::Bar()
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     QWORD PTR [rbp-8], rdi
        mov     rax, QWORD PTR [rbp-8]
        mov     rdi, rax
        call    std::vector<int, std::allocator<int> >::vector()
        nop
        leave
        ret

As you can see, these are identical, too.

¹ GCC 8.3

edited Dec 10, 2019 at 7:48

answered Dec 10, 2019 at 7:32

Evg

26.6k5 gold badges45 silver badges94 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

n. m. could be an AI Over a year ago

@Elliott-ReinstateMonica The optimiser does not optimise the two constructors differently. It optimises both out, completely.The answers shows that the generated code is the same for all optimisation levels. The problem with your measurements is that the noise dominates the signal.

Evg Over a year ago

@Elliott-ReinstateMonica, they are identical before, and they are identical after. Even the same assembly code can be timed differently on modern CPUs.

Elliott Over a year ago

@Evg, thanks a lot. Okay. After like two hours of looking at this it seems that my original idea of these two constructors was correct: they're the same. I should learn a to use assembly to answer questions like these in future.

Evg Over a year ago

@Elliott-ReinstateMonica, https://godbolt.org will be your good friend.

Nicol Bolas Over a year ago

@Elliott: FYI: they're not identical. The = default constructor may be trivial (depending on the member subobjects), while the {} constructor never will be.

|

ALX23z · Accepted Answer · 2019-12-10 07:04:57Z

4

Foo() = default; and Foo() {}; are different. Former is trivial default constructor while latter is a custom version of default constructor that does nothing beside default stuff.

This can be observed via type_traits. Such a change might affect allocation/construction routines chosen in template function resolutions leading to utilization of completely different code.

While this should matter little for default constructor - for copy constructor/assignment it might change quite a lot. So = default is preferred whenever possible.

answered Dec 10, 2019 at 7:04

ALX23z

4,7611 gold badge13 silver badges20 bronze badges

8 Comments

Elliott Over a year ago

Thanks. What do you mean by "This can be observed via type_traits". How so?

Evg Over a year ago

Foo() = default; is not trivial.

ALX23z Over a year ago

@Elliott-ReinstateMonica there are functions that test for various properties of types. E.g. std::is_default_constructible or std::is_trivially_copyable. There are tests that identify it.

Jerry Coffin Over a year ago

@ALX23z: I think he's asking a more specific question--not what type traits do in general, but what specific type trait can detect the difference between an empty default ctor and an explicitly defaulted default ctor in this class. In this case,with either of them, it's not trivial, but is default constructible.

nada Over a year ago

latter is a custom version of default constructor that does nothing beside default stuff - that's funny but also very unclear. Could you please specify default stuff?

|

Jerry Coffin · Accepted Answer · 2019-12-10 07:57:51Z

I suspect the difference in speed you think you see is mostly a by-product of poor timing, and is not real.

For the sake of looking at the generated result, I simplified your code a bit, to leave just the following:

#include <vector>

template <typename T>
struct Foo
{
    std::vector<T> data_;

    Foo() = default;
};

template <typename T>
struct Bar
{
    std::vector<T> data_;

    Bar() {};
};

int main() { 
    Foo<int> f;

    Bar<int> b;
}

I then put that on Godbolt to make it easy to look at the generated code.

gcc 9.2 seems to produce identical code for both ctors, looking like this in both cases:

push    rbp
mov     rbp, rsp
sub     rsp, 16
mov     QWORD PTR [rbp-8], rdi
mov     rax, QWORD PTR [rbp-8]
mov     rdi, rax
call    std::vector<int, std::allocator<int> >::vector() [complete object constructor]
nop
leave
ret

Clang produces slightly different code, but (again) identical for the two classes:

push    rbp
mov     rbp, rsp
sub     rsp, 16
mov     qword ptr [rbp - 8], rdi
mov     rdi, qword ptr [rbp - 8]
call    std::vector<int, std::allocator<int> >::vector() [base object constructor]
add     rsp, 16
pop     rbp
ret

Intel icc is pretty much the same, producing this code for both classes:

push      rbp                                           #8.5
mov       rbp, rsp                                      #8.5
sub       rsp, 16                                       #8.5
mov       QWORD PTR [-16+rbp], rdi                      #8.5
mov       rax, QWORD PTR [-16+rbp]                      #8.5
mov       rdi, rax                                      #8.5
call      std::vector<int, std::allocator<int> >::vector() [complete object constructor]                      #8.5
leave                                                   #8.5
ret

While I agree with others that looking at performance with optimization disabled accomplishes little, in this case it appears that even disabling optimization isn't enough (at least with those three compilers) to get different code for constructing objects of the two classes. I guess I wouldn't be terribly surprised if there is some compiler and/or optimization setting that will produce different results, but I'm afraid I'm not quite ambitious enough to spend a lot more time looking for it.

Jesper Juhl · Accepted Answer · 2019-12-10 07:05:09Z

1

Foo() = default; is a trivial constructor.

Foo() {} is a user defined constructor and user defined constructors are, by definition, never trivial even when they are empty.

See also: Trivial default constructor and std::is_trivial.

It's expected that when compiler optimizations are enabled that a trivial constructor may be faster than a user provided one.

edited Dec 10, 2019 at 7:05

answered Dec 10, 2019 at 6:56

Jesper Juhl

32.2k4 gold badges55 silver badges80 bronze badges

4 Comments

Elliott Over a year ago

Interesting. It was actually when my optimizers were disabled that I saw the most drastic difference in performance (because there was actually objects being made). I'll have a read and come back here. Thanks.

Evg Over a year ago

Foo() = default; is not trivial at all. It is defaulted. static_assert(!std::is_trivially_default_constructible_v<Foo<int>>);.

Jesper Juhl Over a year ago

@Elliot It's usually s fairly pointless to try and reason about performance when the optimizer is not enabled. Both the compiler and start date library insert all sorts of debug checks and sometimes do extra work to zero initialize variables and lots of other stuff that slows down performance and tilt the scales as to what's fast and what's slow. Don't waste your time measuring debug builds; they don't accurately reflect performance of release builds.

Elliott Over a year ago

@JesperJuhl Thanks. My logic was that I wanted to experiment on performance on the simplest problem, so I didn't want the compiler to throw my code away during optimisation by recognising that it's useless. I also thought that understanding what the compiler was doing would be simpler to understand without optimisations...

Collectives™ on Stack Overflow

Why is there performance variation using default constructor "{}" instead of "= default"?

4 Answers 4

6 Comments

8 Comments

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

8 Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related