c++ nested loop performance [closed]

Question

Closed. This question needs details or clarity. It is not currently accepting answers.

Want to improve this question? As written, this question is lacking some of the information it needs to be answered. If the author adds details in comments, consider editing them into the question. Once there's sufficient detail to answer, vote to reopen the question.

Closed 11 years ago.

Improve this question

I have basically two vectors one for a large number of elements and a second for a small number of probes used to sample data of the elements. I stumbled upon the question in which order to implement the two loops. Naturally I thought having the outer loop over the larger vector would be beneficially

Implementation 1:

for(auto& elem: elements) {
    for(auto& probe: probes) {
        probe.insertParticleData(elem);
    }
}

However it seems that the second implementation takes only half of the time

Implementation 2:

for(auto& probe: probes) {
    for(auto& elem: elements) {
        probe.insertParticleData(elem);
    }
}

What could be the reason for that?

Edit:

Timings were generated by the following code

clock_t t_begin_ps = std::clock();
... // timed code
clock_t t_end_ps = std::clock();
double elapsed_secs_ps = double(t_end_ps - t_begin_ps) / CLOCKS_PER_SEC;

and on inserting the elements data I do basically two things, testing if the distance to the probe is below a limit and the computing an average

probe::insertParticleData (const elem& pP) {
   if (!isInside(pP.position())) {return false;}
   ... // compute alpha and beta
   avg_vel = alpha*avg_vel + beta*pP.getVel();
   return true;
}

To get an idea of the memory usage I have approx. 10k elements which are objects with 30 double data members. For the test I used 10 probes containing 15 doubles.

That may also depend of what probe.insertParticleData(elem); does. — Jarod42
– Jarod42, Commented Nov 26, 2014 at 8:02
It really depends on the number and sizes of the elements, and the memory access pattern. Could you add some more detail ? — Paul R
– Paul R, Commented Nov 26, 2014 at 8:09

Bulletmagnet · Accepted Answer · 2014-11-26 08:18:54Z

5

Todays CPUs are heavily optimized for linear access to memory. Therefore a few long loops will beat many short loops. You want the inner loop to iterate over the long vector.

answered Nov 26, 2014 at 8:18

Bulletmagnet

6,0913 gold badges31 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user1204242 Over a year ago

What does linear access mean exactly?

Paul R Over a year ago

Linear in this context generally means contiguous - loading whole cache lines and using only part of each cache line is inefficient.

blabla999 · Accepted Answer · 2014-11-26 08:14:53Z

2

My guess: if insertParticleData is virtual, the compiler will treat the function's address as a constant within the inner loop and move the vtable fetch outside the inner loop. I.e. effectively generate code which looks like:

   for (auto& probe: probes) {
      funcPtr p = probe.insertParticleData;
      for (auto& elem: elements) {
        (*p)(elem);
      }
   }

whereas in the first version, p would be computed for every inner iteration.

answered Nov 26, 2014 at 8:14

blabla999

3,20224 silver badges24 bronze badges

1 Comment

user1204242 Over a year ago

No it is not a virtual function

Collectives™ on Stack Overflow

c++ nested loop performance [closed]

2 Answers 2

2 Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Linked

Related