Performance differences in run-time polymorphism between C and C++

Question

I know benchmarking is a very delicate subject and simple, not-well-thought-out benchmarks are mostly meaningless for performance comparisons, but what I have right now is actually a pretty small and contrived example that I think should be easily explainable. So, even if the question seems unhelpful, it would at least help me in understanding benchmarking.

So, here I go.

I was trying to experiment with simple API design in C, using run-time polymorphism kind of behaviour via void *. Then I compared it with same thing implemented in C++ using regular virtual functions. Here is the code:

#include <cstdlib>
#include <cstdio>
#include <cstring>

int dummy_computation()
{
    return 64 / 8;
}

/* animal library, everything is prefixed with al for namespacing */
#define AL_SUCCESS 0;
#define AL_UNKNOWN_ANIMAL 1;
#define AL_IS_TYPE_OF(animal, type) \
    strcmp(((type *)animal)->animal_type, #type) == 0\

typedef struct {
    const char* animal_type;
    const char* name;
    const char* sound;
} al_dog;

inline int make_dog(al_dog** d) {
    *d = (al_dog*) malloc(sizeof(al_dog));
    (*d)->animal_type = "al_dog";
    (*d)->name = "leslie";
    (*d)->sound = "bark";
    return AL_SUCCESS;
}

inline int free_dog(al_dog* d) {
    free(d);
    return AL_SUCCESS;
}
    
typedef struct {
    const char* animal_type;
    const char* name;
    const char* sound;
} al_cat;

inline int make_cat(al_cat** c) {
    *c = (al_cat*) malloc(sizeof(al_cat));
    (*c)->animal_type = "al_cat";
    (*c)->name = "garfield";
    (*c)->sound = "meow";
    return AL_SUCCESS;
}

inline int free_cat(al_cat* c) {
    free(c);
    return AL_SUCCESS;
}

int make_sound(void* animal) {
    if(AL_IS_TYPE_OF(animal, al_cat)) {
        al_cat *c = (al_cat*) animal;
        return dummy_computation();
    } else if(AL_IS_TYPE_OF(animal, al_dog)) {
        al_dog *d = (al_dog*) animal;
        return dummy_computation();
    } else {
        printf("unknown animal\n");
        return 0;
    }
}
/* c style library finishes here */

/* cpp library with OOP */
struct animal {
    animal(const char* n, const char* s) 
    :name(n)
    ,sound(s)
    {} 
    virtual int make_sound() {
        return dummy_computation();
    }
    const char* name;
    const char* sound;
};

struct cat : animal {
    cat() 
    :animal("garfield", "meow")
    {}
};

struct dog : animal {
    dog() 
    :animal("leslie", "bark")
    {}
};
/* cpp library finishes here */

I have something called dummy_computation, just to make sure I get some computational thingy going on in the benchmark. I would normally implement different printf calls for barking, meowing etc. for such an example but printf is not easily benchmarkable in quick-benchmarks.com. The actual thing I want to benchmark is run-time polymorphism implementation. So that's why I chose to make some small function and used it in both C and C++ implementation as a filler.

Now, in quick-benchmarks.com, I have a benchmark like following:

static void c_style(benchmark::State& state) {
  // Code inside this loop is measured repeatedly
  for (auto _ : state) {
    al_dog* d = NULL;
    al_cat* c = NULL;

    make_dog(&d);
    make_cat(&c);
    
    int i1 = make_sound(d);
    benchmark::DoNotOptimize(i1);
    int i2 = make_sound(c);
    benchmark::DoNotOptimize(i2);

    free_dog(d);
    free_cat(c);
  }
}
// Register the function as a benchmark
BENCHMARK(c_style);

static void cpp_style(benchmark::State& state) {
  for (auto _ : state) {
    animal* a1 = new dog();
    animal* a2 = new cat();
    int i1 = a1->make_sound();
    benchmark::DoNotOptimize(i1);
    int i2 = a2->make_sound();
    benchmark::DoNotOptimize(i2);
    delete a1;
    delete a2;
  }
}
BENCHMARK(cpp_style);

I added DoNotOptimize calls so that virtual calls would not end up being optimized-out.

Whole benchmark can be found here, if recreating it seems painful.

https://quick-bench.com/q/ezul9hDXTjfSWijCfd2LMUUEH1I

Now, to my surprise, C version comes out 27 times faster in the results. I expected maybe some performance hits on C++ version because it is a more refined solution but definitely not 27-fold.

Can someone explain these results? Do virtual function calls really incur this much overhead compared to C? Or is it the way I set up this benchmarking experiment that is completely meaningless? If so, how would one more correctly benchmark such issues?

I am pretty sure that the cost of malloc/new dominates the execution time — tstanisl
– tstanisl, Commented Sep 7, 2022 at 14:20
as a matter of fact, if you look at the assembly view on the link he posted, it shows just that. A large part of the time is around the call to new — Taekahn
– Taekahn, Commented Sep 7, 2022 at 14:24
C version doesn't do function ptrs, it does if-else branching with a small set of types. — lorro
– lorro, Commented Sep 7, 2022 at 14:25
Your C code doesn't do a "virtual call"; it always calls a statically-determined function. Change it so it selects a function at runtime. — molbdnilo
– molbdnilo, Commented Sep 7, 2022 at 14:26

lorro · Accepted Answer · 2022-09-07 14:24:17Z

5

It's because you're not implementing the same thing. If you do an if-chain of switch-chain in C, then you have (mathematically) a discriminated union, which is std::variant in C++.

If you'd like the C++ version to be ported to C, then you need function pointers. It'll very likely be equally slow. The reason behind, virtual means forward compatible: any code, including a library loaded later, can descend from your base and implement the virtual methods. It means, sometimes you don't even know at compile-time of your base module what (descendant) classes it might need to handle (the type system is open). Such forward compatibility is not provided for std::variant, which is closed (limited to a fixed list of types).

answered Sep 7, 2022 at 14:24

lorro

10.9k25 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

meguli Over a year ago

That makes sense, I was expecting some differences going in since virtual functions do much more. I will definitely try the variant version as well. The thing I want to actually get out this experiment was, most of the time I have a closed set of types but I am still using virtual functions out of habit. I wanted to see how a hand-rolled solution would perform in such cases. Ie. just to make me convince when I truly need virtual.

spectras Over a year ago

@meguli> when you have a closed set of types and you don't care of keeping it open to addition, inheritance is the wrong tool for the job, so you are right to question habits. I wish more people did, I still often see inheritance hierarchies being created for no good reason in new code.

Slava Over a year ago

@spectras "inheritance is the wrong tool for the job" you assume that execution speed is most valuable result of a program. It is quite often (I would say most of the time) not. If code with inheritance is easier to read and implement, which is subjective of course, it may be preferable tool than faster but more complicated approach.

lorro Over a year ago

@Slava It's not only speed, it's also verifiability & testability. For a closed hierarchy, you can write proper tests: test each of them. You might ignore some combinations e.g. when it's N x N x N, but you're aware of what you ignore. For an open hierarchy, it's very difficult.

meguli Over a year ago

But I don't think a solution using variants is less-readable, at least feels as much pleasant to me if not more. So I kinda get the sentiment; use variants if you have closed set of types and not inheritance in such cases.

|

Collectives™ on Stack Overflow

Performance differences in run-time polymorphism between C and C++

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related