2

I know a lot of questions on these topics have been asked before, but I have a specific case where speed is (moderately) important and the speed increase when using function pointers rather than virtual functions is about 25%. I wondered (for mostly academic reasons) why? To give more detail, I am writing a simulation which consists of a series of Cells. The Cells are connected by Links which dictate the interactions between the Cells. The Link class has a virtual function called update(), which causes the two Cells it is linking to interact. It is virtual so that I can create different types of Links to give different types of interactions. For example, at the moment I am simulating inviscid flow, but I may want a link which has viscosity or applies a boundary condition. The second way I can achieve the same affect is to pass a function pointer to the Link class and make the target of the function pointer a friend. I can now have a non-virtual update() which uses the function pointer. Derived classes can use pointers to different functions giving polymorphic behaviour.

When I build the two versions and profile with Very Sleepy, I find that the function pointer version is significantly faster than the virtual function version and that it appears that the Link class has been entirely optimised away - I just see calls from my main function to the functions pointed to.

I just wondered what made it easier for my compiler (MSVC++ 2012 Express) to optimise the function pointer case better than the virtual function case?

Some code below if it helps for the function pointer case, I'm sure it is obvious how the equivalent would be done with virtual functions.

void InviscidLinkUpdate( void * linkVoid )
{
    InviscidLink * link=(InviscidLink*)linkVoid;
    //do some stuff with the link
    //e.g.
    //link->param1=
}

void ViscousLinkUpdate( void * linkVoid )
{
    ViscousLink * link=(ViscousLink*)linkVoid;
    //do some stuff with the link
    //e.g.
    //link->param1=
}

class Link
{
public:
    Link(Cell *cell1, Cell*cell2, float area, float timeStep, void (*updateFunc)( void * ))
        :m_cell1(cell1), m_cell2(cell2), m_area(area), m_timeStep(timeStep), m_update(updateFunc)
    ~Link(){};
    void update() {m_update( this );}
protected:
    void (*const m_update)( void *, UNG_FLT );
    Cell *m_cell1;
    Cell *m_cell2;
    float m_area;
    float m_timeStep
    //some other parameters I want to modify in update()
    float param1;
    float param2;

};

class InviscidLink : public Link
{
    friend void InviscidLinkUpdate( void * linkVoid )
public:
    InviscidLink(Cell *cell1, Cell*cell2, float area, float timeStep)
        Link(cell1, cell2, area, timeStep, InvicedLinkUpdate)
    {}
};

class ViscousLink : public Link
{
    friend void ViscousLinkUpdate( void * linkVoid )
public:
    ViscousLink(Cell *cell1, Cell*cell2, float area, float timeStep)
        Link(cell1, cell2, area, timeStep, ViscousLinkUpdate)
    {}
};

edit

I have now put the full source on GitHub - https://github.com/philrosenberg/ung Compare commit 5ca899d39aa85fa3a86091c1202b2c4cd7147287 (the function pointer version) with commit aff249dbeb5dfdbdaefc8483becef53053c4646f (the virtual function version). Unfortunately I based the test project initially on a wxWidgets project in case I wanted to play with some graphical display so if you don't have wxWidgets then you will need to hack it into a command line project to compile it. I used Very Sleepy to benchmark it

further edit:

milianw's comment about profile guided optimization turned out to be the solution, but as a comment I currently cannot mark it as the answer. Using the pro version of Visual Studio with the profile guided optimization gave similar runtimes as using inline functions. I guess this is the Virtual Call Speculation described at http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx. I still find it a bit odd that this code could be more easily optimized using function pointers instead of virtual functions, but I guess that is why everyone advises to TEST, rather than assume certain code is faster than another.

4
  • 2
    Would be useful if you could post your benchmark here (the two pieces of code which you are comparing, and a short test-case which executes each one of them and measures the execution time). Commented Dec 18, 2014 at 15:01
  • Sorry, the full code is a thousand lines or so. I would upload to GitHub (and will in the future) but I haven't yet settled on a project name Commented Dec 18, 2014 at 15:22
  • 1
    Try to use profile-guided optimizations here. Then, the profiler can potentially apply devirtualization to speed up the code. Also, don't forget to mark your implementations as final, which can further help. See e.g. channel9.msdn.com/Shows/C9-GoingNative/… or the excellent GCC article series over at hubicka.blogspot.de/search/label/devirtualization Commented Dec 19, 2014 at 10:13
  • @milianw Turned out this was the answer - I got hold of the pro version of Visual Studio 2012 from work and did a profile guided optimization build, this inlined not only the Link object and its virtual functions, but also further functions as well, giving similar performance to using function pointers. If you submit this as an answer I will accept it. Commented Jan 5, 2015 at 14:49

4 Answers 4

1

Two things I can think about that differs when using function pointers vs virtual functions :

  • Your class size will be smaller since it won't have a vftable allocated hence smaller size, more cache friendly
  • There's one indirection less with function pointer ( With virtual functions : Object Indirection, vftable indirection, virtual function indirection , with functors : Object indirection, functor indirection -> your update function is resolved at compile time, since it's not virtual)
Sign up to request clarification or add additional context in comments.

4 Comments

The class size should be the same, the vtable is not stored in the class, only a pointer to it is, and that takes up the same amount of space as the m_update member (and less space if you replace more than one virtual function with function pointers)
I think this is linked to the matter. The dereferencing seems to take a long time. Perhaps it is lost from the cache. Am I correct in thinking the vtable pointer cannot be optimised away - I saw various contradicting things about inlining virtual functions
@PhilRosenberg: If you want to eliminate any doubt about that, take a look at the assembler ;)
@Jonathan Wakely : true , I don't have the virtual function implementation approach code, so i'm just speculating.Regarding OPs comment: my first thought was also a cache miss situation.In addition to your findings, from my previous profilings on cache miss the usual penalty is of ~250% (profiled on intel x32 , x64 and armv5 , armv7), and you have a 25% , which means you have a cache miss once every ~10 accesses.
1

As requested, here my comment again as an answer:

Try to use profile-guided optimizations here. Then, the profiler can potentially apply devirtualization to speed up the code. Also, don't forget to mark your implementations as final, which can further help. See e.g. http://channel9.msdn.com/Shows/C9-GoingNative/C9GoingNative-12-C-at-BUILD-2012-Inside-Profile-Guided-Optimization or the excellent GCC article series over at http://hubicka.blogspot.de/search/label/devirtualization.

Comments

0

The actual cost of a virtual function call is normally insignificant. However, virtual functions may, as you observed, impact the code speed considerably. The main reason is that normally a virtual function call is real function call - with adding a frame to the stack. It is so because virtual function calls are resolved in runtime.

If a function is not virtual, it is much easier for C++ compiler to inline it. The call is resolved in compilation time, so the compiler may replace the call with the body of the called function. This allows for much more aggressive optimisations - like doing some computations once only instead of each loop run, etc.

5 Comments

However, the same is true here, right? The OP is calling through a function pointer.
Some calls through function pointers can be inlined, and some virtual function calls can be inlined too. It depends.
Both comments are true, but it is much easier for the compiler to track the value of a function pointer and then inline it than to inline a virtual function.
The function pointer itself isn't getting inlined - in the function pointer case the entire Link class is getting optimised away, except the function pointer - so I just see direct calls from my main function to *LinkUpdate
@PhilRosenberg Those direct calls are the result of inlining the calls which enclosed them.
0

Based on the information provided here my best guess is that you're operating on a large number of objects, and that the one extra indirection induced by the virtual table is increasing cache misses to the point where the I/O to refetch from memory becomes measurable.

As another alternative have you considered using templates and either CRTP or a policy-based approach for the Link class? Depending on your needs it may be possible to entirely remove the dynamic dispatching.

2 Comments

I'm not sure this follows. There are only two vtables in the OP's case; if they're being constantly evicted from cache then something weird is going on...
I do have a large number of Links and Cells, my test case is 10000 of each, so yes caching may be an issue. I did try a template version, it had similar speed to the function pointer case, but building a full simulation became very complex, because I can't store pointers to Links in a single array and the cell needs a pointer to the links (6 of them for a 3d simulation) which then require templating, then I have to store the cells in different arrays...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.