optimising function pointers and virtual functions in C++

Question

I know a lot of questions on these topics have been asked before, but I have a specific case where speed is (moderately) important and the speed increase when using function pointers rather than virtual functions is about 25%. I wondered (for mostly academic reasons) why? To give more detail, I am writing a simulation which consists of a series of Cells. The Cells are connected by Links which dictate the interactions between the Cells. The Link class has a virtual function called update(), which causes the two Cells it is linking to interact. It is virtual so that I can create different types of Links to give different types of interactions. For example, at the moment I am simulating inviscid flow, but I may want a link which has viscosity or applies a boundary condition. The second way I can achieve the same affect is to pass a function pointer to the Link class and make the target of the function pointer a friend. I can now have a non-virtual update() which uses the function pointer. Derived classes can use pointers to different functions giving polymorphic behaviour.

When I build the two versions and profile with Very Sleepy, I find that the function pointer version is significantly faster than the virtual function version and that it appears that the Link class has been entirely optimised away - I just see calls from my main function to the functions pointed to.

I just wondered what made it easier for my compiler (MSVC++ 2012 Express) to optimise the function pointer case better than the virtual function case?

Some code below if it helps for the function pointer case, I'm sure it is obvious how the equivalent would be done with virtual functions.

void InviscidLinkUpdate( void * linkVoid )
{
    InviscidLink * link=(InviscidLink*)linkVoid;
    //do some stuff with the link
    //e.g.
    //link->param1=
}

void ViscousLinkUpdate( void * linkVoid )
{
    ViscousLink * link=(ViscousLink*)linkVoid;
    //do some stuff with the link
    //e.g.
    //link->param1=
}

class Link
{
public:
    Link(Cell *cell1, Cell*cell2, float area, float timeStep, void (*updateFunc)( void * ))
        :m_cell1(cell1), m_cell2(cell2), m_area(area), m_timeStep(timeStep), m_update(updateFunc)
    ~Link(){};
    void update() {m_update( this );}
protected:
    void (*const m_update)( void *, UNG_FLT );
    Cell *m_cell1;
    Cell *m_cell2;
    float m_area;
    float m_timeStep
    //some other parameters I want to modify in update()
    float param1;
    float param2;

};

class InviscidLink : public Link
{
    friend void InviscidLinkUpdate( void * linkVoid )
public:
    InviscidLink(Cell *cell1, Cell*cell2, float area, float timeStep)
        Link(cell1, cell2, area, timeStep, InvicedLinkUpdate)
    {}
};

class ViscousLink : public Link
{
    friend void ViscousLinkUpdate( void * linkVoid )
public:
    ViscousLink(Cell *cell1, Cell*cell2, float area, float timeStep)
        Link(cell1, cell2, area, timeStep, ViscousLinkUpdate)
    {}
};

edit

I have now put the full source on GitHub - https://github.com/philrosenberg/ung Compare commit 5ca899d39aa85fa3a86091c1202b2c4cd7147287 (the function pointer version) with commit aff249dbeb5dfdbdaefc8483becef53053c4646f (the virtual function version). Unfortunately I based the test project initially on a wxWidgets project in case I wanted to play with some graphical display so if you don't have wxWidgets then you will need to hack it into a command line project to compile it. I used Very Sleepy to benchmark it

further edit:

milianw's comment about profile guided optimization turned out to be the solution, but as a comment I currently cannot mark it as the answer. Using the pro version of Visual Studio with the profile guided optimization gave similar runtimes as using inline functions. I guess this is the Virtual Call Speculation described at http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx. I still find it a bit odd that this code could be more easily optimized using function pointers instead of virtual functions, but I guess that is why everyone advises to TEST, rather than assume certain code is faster than another.

Would be useful if you could post your benchmark here (the two pieces of code which you are comparing, and a short test-case which executes each one of them and measures the execution time). — barak manos
– barak manos, Commented Dec 18, 2014 at 15:01
Sorry, the full code is a thousand lines or so. I would upload to GitHub (and will in the future) but I haven't yet settled on a project name — Phil Rosenberg
– Phil Rosenberg, Commented Dec 18, 2014 at 15:22
Try to use profile-guided optimizations here. Then, the profiler can potentially apply devirtualization to speed up the code. Also, don't forget to mark your implementations as final, which can further help. See e.g. channel9.msdn.com/Shows/C9-GoingNative/… or the excellent GCC article series over at hubicka.blogspot.de/search/label/devirtualization — milianw
– milianw, Commented Dec 19, 2014 at 10:13
@milianw Turned out this was the answer - I got hold of the pro version of Visual Studio 2012 from work and did a profile guided optimization build, this inlined not only the Link object and its virtual functions, but also further functions as well, giving similar performance to using function pointers. If you submit this as an answer I will accept it. — Phil Rosenberg
– Phil Rosenberg, Commented Jan 5, 2015 at 14:49

MichaelCMS · Accepted Answer · 2014-12-18 15:04:22Z

1

Two things I can think about that differs when using function pointers vs virtual functions :

Your class size will be smaller since it won't have a vftable allocated hence smaller size, more cache friendly
There's one indirection less with function pointer ( With virtual functions : Object Indirection, vftable indirection, virtual function indirection , with functors : Object indirection, functor indirection -> your update function is resolved at compile time, since it's not virtual)

answered Dec 18, 2014 at 15:04

MichaelCMS

4,7732 gold badges26 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Jonathan Wakely Over a year ago

The class size should be the same, the vtable is not stored in the class, only a pointer to it is, and that takes up the same amount of space as the m_update member (and less space if you replace more than one virtual function with function pointers)

Phil Rosenberg Over a year ago

I think this is linked to the matter. The dereferencing seems to take a long time. Perhaps it is lost from the cache. Am I correct in thinking the vtable pointer cannot be optimised away - I saw various contradicting things about inlining virtual functions

Oliver Charlesworth Over a year ago

@PhilRosenberg: If you want to eliminate any doubt about that, take a look at the assembler ;)

MichaelCMS Over a year ago

@Jonathan Wakely : true , I don't have the virtual function implementation approach code, so i'm just speculating.Regarding OPs comment: my first thought was also a cache miss situation.In addition to your findings, from my previous profilings on cache miss the usual penalty is of ~250% (profiled on intel x32 , x64 and armv5 , armv7), and you have a 25% , which means you have a cache miss once every ~10 accesses.

milianw · Accepted Answer · 2015-01-06 13:03:09Z

1

As requested, here my comment again as an answer:

Try to use profile-guided optimizations here. Then, the profiler can potentially apply devirtualization to speed up the code. Also, don't forget to mark your implementations as final, which can further help. See e.g. http://channel9.msdn.com/Shows/C9-GoingNative/C9GoingNative-12-C-at-BUILD-2012-Inside-Profile-Guided-Optimization or the excellent GCC article series over at http://hubicka.blogspot.de/search/label/devirtualization.

answered Jan 6, 2015 at 13:03

milianw

5,3562 gold badges41 silver badges43 bronze badges

Comments

Wojtek Surowka · Accepted Answer · 2014-12-18 15:03:53Z

0

The actual cost of a virtual function call is normally insignificant. However, virtual functions may, as you observed, impact the code speed considerably. The main reason is that normally a virtual function call is real function call - with adding a frame to the stack. It is so because virtual function calls are resolved in runtime.

If a function is not virtual, it is much easier for C++ compiler to inline it. The call is resolved in compilation time, so the compiler may replace the call with the body of the called function. This allows for much more aggressive optimisations - like doing some computations once only instead of each loop run, etc.

answered Dec 18, 2014 at 15:03

Wojtek Surowka

21.1k4 gold badges48 silver badges53 bronze badges

5 Comments

Oliver Charlesworth Over a year ago

However, the same is true here, right? The OP is calling through a function pointer.

Jonathan Wakely Over a year ago

Some calls through function pointers can be inlined, and some virtual function calls can be inlined too. It depends.

Wojtek Surowka Over a year ago

Both comments are true, but it is much easier for the compiler to track the value of a function pointer and then inline it than to inline a virtual function.

Phil Rosenberg Over a year ago

The function pointer itself isn't getting inlined - in the function pointer case the entire Link class is getting optimised away, except the function pointer - so I just see direct calls from my main function to *LinkUpdate

Wojtek Surowka Over a year ago

@PhilRosenberg Those direct calls are the result of inlining the calls which enclosed them.

Mark B · Accepted Answer · 2014-12-18 15:17:20Z

0

Based on the information provided here my best guess is that you're operating on a large number of objects, and that the one extra indirection induced by the virtual table is increasing cache misses to the point where the I/O to refetch from memory becomes measurable.

As another alternative have you considered using templates and either CRTP or a policy-based approach for the Link class? Depending on your needs it may be possible to entirely remove the dynamic dispatching.

answered Dec 18, 2014 at 15:17

Mark B

96.5k10 gold badges113 silver badges198 bronze badges

2 Comments

Oliver Charlesworth Over a year ago

I'm not sure this follows. There are only two vtables in the OP's case; if they're being constantly evicted from cache then something weird is going on...

Phil Rosenberg Over a year ago

I do have a large number of Links and Cells, my test case is 10000 of each, so yes caching may be an issue. I did try a template version, it had similar speed to the function pointer case, but building a full simulation became very complex, because I can't store pointers to Links in a single array and the cell needs a pointer to the links (6 of them for a 3d simulation) which then require templating, then I have to store the cells in different arrays...

Collectives™ on Stack Overflow

optimising function pointers and virtual functions in C++

4 Answers 4

4 Comments

Comments

5 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

Comments

5 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related