1

I recently noticed that I was having a performance hit because I was declaring a default constructor like:

Foo() = default;

instead of

Foo() {}

(Just FYI, I needed to explicitly declare it because I also had a variadic constructor that would otherwise override the default constructor)

This seemed strange to me because I thought that these two lines of code are identical (Well, so long as a default constructor is possible. If the default constructor isn't possible, the second line of code would produce an error and the first would implicitly delete the default constructor. 'Not my situation!).

Okay, so I made a little tester and the results vary quite a lot depending on the compiler, but with certain settings I get consistent results that one is faster over the other:

#include <chrono>

template <typename T>
double TimeDefaultConstructor (int n_iterations)
{
    auto start_time = std::chrono::system_clock::now();

    for (int i = 0; i < n_iterations; ++i)
        T t;

    auto end_time = std::chrono::system_clock::now();

    std::chrono::duration<double> elapsed_seconds = end_time - start_time;

    return elapsed_seconds.count();
}

template <typename T, typename S>
double CompareDefaultConstructors (int n_comparisons, int n_iterations)
{
    int n_comparisons_with_T_faster = 0;

    for (int i = 0; i < n_comparisons; ++i)
    {
        double time_for_T = TimeDefaultConstructor<T>(n_iterations);
        double time_for_S = TimeDefaultConstructor<S>(n_iterations);

        if (time_for_T < time_for_S)    
            ++n_comparisons_with_T_faster;  
    }

    return (double) n_comparisons_with_T_faster / n_comparisons;
}


#include <vector>

template <typename T>
struct Foo
{
    std::vector<T> data_;

    Foo() = default;
};

template <typename T>
struct Bar
{
    std::vector<T> data_;

    Bar() {};
};

#include <iostream>

int main ()
{
    int n_comparisons = 10000;
    int n_iterations = 10000;

    typedef int T;

    double result = CompareDefaultConstructors<Foo<T>,Bar<T>> (n_comparisons, n_iterations);

    std::cout << "With " << n_comparisons << " comparisons of " << n_iterations
        << " iterations of the default constructor, Foo<" << typeid(T).name() << "> was faster than Bar<" << typeid(T).name() << "> "
        << result*100 << "% of the time" << std::endl;

    std::cout << "swapping orientation:" << std::endl;

    result = CompareDefaultConstructors<Bar<T>,Foo<T>> (n_comparisons, n_iterations);

    std::cout << "With " << n_comparisons << " comparisons of " << n_iterations
        << " iterations of the default constructor, Bar<" << typeid(T).name() << "> was faster than Foo<" << typeid(T).name() << "> "
        << result*100 << "% of the time" << std::endl;

    return 0;
}

Using the above program with g++ -std=c++11 I consistently get output similar to:

With 10000 comparisons of 10000 iterations of the default constructor, Foo was faster than Bar 4.69% of the time swapping orientation: With 10000 comparisons of 10000 iterations of the default constructor, Bar was faster than Foo 96.23% of the time

Changing the compiler settings seems to change the result, sometimes flipping it entirely. But what I can't understand is why it matters at all?

10
  • Using system_clock to time things is not a good idea. Commented Dec 10, 2019 at 6:37
  • @NicolBolas, I'm not interested in how accurate the timings are. I'm interested in the fact that Foo<T> can have consistently better performance than Bar<T> (or vise-versa). The clock is good enough to show that. Commented Dec 10, 2019 at 6:46
  • Did you test an optimized build? If not, your results are pointless. Commented Dec 10, 2019 at 6:52
  • 1
    Unoptimised compilation is not designed for performance. Measuring performance of unoptimised code is therefore a form of useless entertainment. Commented Dec 10, 2019 at 7:53
  • @n.'pronouns'm. I think that you don't understand what I was trying to do. I thought that the two different ways of declaring the default constructor were identical in C++, but I was seeing performance differences even without optimisers. At this stage I wasn't really interested in performance itself, but that the performance difference was demonstrating to me that the two default constructors didn't appear to be equivalent. Commented Dec 10, 2019 at 7:56

4 Answers 4

7

This benchmark doesn't measure what it is supposed to measure. Replace Bar() {}; with Bar() = default; making Foo and Bar identical, and you'll get the same result:

With 10000 comparisons of 10000 iterations of the default constructor, Foo was faster than Bar 69.89% of the time swapping orientation: With 10000 comparisons of 10000 iterations of the default constructor, Bar was faster than Foo 29.9% of the time

This is a vivid demonstration that you're measuring not constructors but something else.


When you enable -O1 optimization, the for loop with T t; degenerates into1:

        test    ebx, ebx
        jle     .L3
        mov     eax, 0
.L4:
        add     eax, 1
        cmp     ebx, eax
        jne     .L4
.L3:

for both Foo and Bar. That is, into a trivial for (int i = 0; i < n_iterations; ++i); loop.

When you enable -O2 or -O3 it gets optimized out completely.

Without optimization (-O0) you get the following assembly:

        mov     DWORD PTR [rbp-4], 0
.L35:
        mov     eax, DWORD PTR [rbp-4]
        cmp     eax, DWORD PTR [rbp-68]
        jge     .L34
        lea     rax, [rbp-64]
        mov     rdi, rax
        call    Foo<int>::Foo()
        lea     rax, [rbp-64]
        mov     rdi, rax
        call    Foo<int>::~Foo()
        add     DWORD PTR [rbp-4], 1
        jmp     .L35
.L34:

and the same for Bar with Foo replaced with Bar.

Now let's take a look at the constructors:

Foo<int>::Foo()
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     QWORD PTR [rbp-8], rdi
        mov     rax, QWORD PTR [rbp-8]
        mov     rdi, rax
        call    std::vector<int, std::allocator<int> >::vector()
        nop
        leave
        ret

and

Bar<int>::Bar()
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     QWORD PTR [rbp-8], rdi
        mov     rax, QWORD PTR [rbp-8]
        mov     rdi, rax
        call    std::vector<int, std::allocator<int> >::vector()
        nop
        leave
        ret

As you can see, these are identical, too.


1 GCC 8.3

Sign up to request clarification or add additional context in comments.

6 Comments

@Elliott-ReinstateMonica The optimiser does not optimise the two constructors differently. It optimises both out, completely.The answers shows that the generated code is the same for all optimisation levels. The problem with your measurements is that the noise dominates the signal.
@Elliott-ReinstateMonica, they are identical before, and they are identical after. Even the same assembly code can be timed differently on modern CPUs.
@Evg, thanks a lot. Okay. After like two hours of looking at this it seems that my original idea of these two constructors was correct: they're the same. I should learn a to use assembly to answer questions like these in future.
@Elliott-ReinstateMonica, https://godbolt.org will be your good friend.
@Elliott: FYI: they're not identical. The = default constructor may be trivial (depending on the member subobjects), while the {} constructor never will be.
|
4

Foo() = default; and Foo() {}; are different. Former is trivial default constructor while latter is a custom version of default constructor that does nothing beside default stuff.

This can be observed via type_traits. Such a change might affect allocation/construction routines chosen in template function resolutions leading to utilization of completely different code.

While this should matter little for default constructor - for copy constructor/assignment it might change quite a lot. So = default is preferred whenever possible.

8 Comments

Thanks. What do you mean by "This can be observed via type_traits". How so?
Foo() = default; is not trivial.
@Elliott-ReinstateMonica there are functions that test for various properties of types. E.g. std::is_default_constructible or std::is_trivially_copyable. There are tests that identify it.
@ALX23z: I think he's asking a more specific question--not what type traits do in general, but what specific type trait can detect the difference between an empty default ctor and an explicitly defaulted default ctor in this class. In this case,with either of them, it's not trivial, but is default constructible.
latter is a custom version of default constructor that does nothing beside default stuff - that's funny but also very unclear. Could you please specify default stuff?
|
2

I suspect the difference in speed you think you see is mostly a by-product of poor timing, and is not real.

For the sake of looking at the generated result, I simplified your code a bit, to leave just the following:

#include <vector>

template <typename T>
struct Foo
{
    std::vector<T> data_;

    Foo() = default;
};

template <typename T>
struct Bar
{
    std::vector<T> data_;

    Bar() {};
};

int main() { 
    Foo<int> f;

    Bar<int> b;
}

I then put that on Godbolt to make it easy to look at the generated code.

gcc 9.2 seems to produce identical code for both ctors, looking like this in both cases:

push    rbp
mov     rbp, rsp
sub     rsp, 16
mov     QWORD PTR [rbp-8], rdi
mov     rax, QWORD PTR [rbp-8]
mov     rdi, rax
call    std::vector<int, std::allocator<int> >::vector() [complete object constructor]
nop
leave
ret

Clang produces slightly different code, but (again) identical for the two classes:

push    rbp
mov     rbp, rsp
sub     rsp, 16
mov     qword ptr [rbp - 8], rdi
mov     rdi, qword ptr [rbp - 8]
call    std::vector<int, std::allocator<int> >::vector() [base object constructor]
add     rsp, 16
pop     rbp
ret

Intel icc is pretty much the same, producing this code for both classes:

push      rbp                                           #8.5
mov       rbp, rsp                                      #8.5
sub       rsp, 16                                       #8.5
mov       QWORD PTR [-16+rbp], rdi                      #8.5
mov       rax, QWORD PTR [-16+rbp]                      #8.5
mov       rdi, rax                                      #8.5
call      std::vector<int, std::allocator<int> >::vector() [complete object constructor]                      #8.5
leave                                                   #8.5
ret  

While I agree with others that looking at performance with optimization disabled accomplishes little, in this case it appears that even disabling optimization isn't enough (at least with those three compilers) to get different code for constructing objects of the two classes. I guess I wouldn't be terribly surprised if there is some compiler and/or optimization setting that will produce different results, but I'm afraid I'm not quite ambitious enough to spend a lot more time looking for it.

Comments

1

Foo() = default; is a trivial constructor.

Foo() {} is a user defined constructor and user defined constructors are, by definition, never trivial even when they are empty.

See also: Trivial default constructor and std::is_trivial.

It's expected that when compiler optimizations are enabled that a trivial constructor may be faster than a user provided one.

4 Comments

Interesting. It was actually when my optimizers were disabled that I saw the most drastic difference in performance (because there was actually objects being made). I'll have a read and come back here. Thanks.
Foo() = default; is not trivial at all. It is defaulted. static_assert(!std::is_trivially_default_constructible_v<Foo<int>>);.
@Elliot It's usually s fairly pointless to try and reason about performance when the optimizer is not enabled. Both the compiler and start date library insert all sorts of debug checks and sometimes do extra work to zero initialize variables and lots of other stuff that slows down performance and tilt the scales as to what's fast and what's slow. Don't waste your time measuring debug builds; they don't accurately reflect performance of release builds.
@JesperJuhl Thanks. My logic was that I wanted to experiment on performance on the simplest problem, so I didn't want the compiler to throw my code away during optimisation by recognising that it's useless. I also thought that understanding what the compiler was doing would be simpler to understand without optimisations...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.