array traversal vs pointer, cache efficiency aspect

Question

void foo(Node* p[], int size){

    _uint64 arr_of_values[_MAX_THREADS];


    for (int i=0 ; i < size ; i++ ){
         arr_of_values[i] = p[i]->....;

         // much code here 
         // 
      }
 }

vs

void foo(Node* p[], int size){

    _uint64 arr_of_values[_MAX_THREADS];

    Node* p_end = p[size];
    for ( ; p != p_end ; ){            
         arr_of_values[i] = (*p)->.....;
         p++;


         // much code here 
         // 
     }

}

I created this function to demonstrate what i am asking:

what is more efficient from the cache efficiency aspect : taking p[i] or using *p++?

(i'll never use the p[i-x] in the rest of the code, but i may use p[i] or *p in the following calculation)

what is more efficient from cache aspect , increasing p and then loading *p OR increasing the counter i and asking *(p+i) ? — Michael
– Michael, Commented Jun 27, 2012 at 13:48
Cachewise, there's no difference. At most there could be a data dependency difference (but the compiler should optimize that). — Ben Voigt
– Ben Voigt, Commented Jun 27, 2012 at 14:21

Remus Rusanu · Accepted Answer · 2012-06-27 13:50:50Z

2

the most important is to avoid false sharing in the arr_of_values. Each thread write into its own slot, but 8 or 16 slots share a cache line (depending on CPU) causing a massive false sharing problem. Add padding between the slots to cache align each thread's slot, or accumulate on stack and write only once at the end:

void foo(Node* p[], int size){

    _uint64 arr_of_values[_MAX_THREADS];

    Node* p_end = p[size];
    for ( ; p != p_end ; ){            
         temp = .....;
         p++;
         // much code here 
         // 
     }  
     arr_of_values[i] = temp;
}

the question of access by pointer or access by index is irrelevant with today's compiler.s

Your next actions should be: grab a copy of the Software optimization Cookbook. Read it. Measure. Fix the measured hotspot, not the guesstimates.

answered Jun 27, 2012 at 13:50

Remus Rusanu

296k42 gold badges459 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Ben Voigt Over a year ago

This would be a great answer if there were any threads involved at all... but this appears to be a question about sequential code.

Remus Rusanu Over a year ago

@_BenVoigt: Agreed, but I took a leap of faith and assumed that the array being dimensioned by [_MAX_THREADS] is used as one slot-per-thread.

Michael Over a year ago

Remus- you are right, actually this snippet is based on a complicated code with multiple hierarchy of multiple threads that merge items from multiple queues into local queues +then more threads merge items from those local queues into a central. my code has bad performance, and i believe that the bottleneck is this function which iterates over multiple queues: i have an option to increase the pointer on that queue or to iterate over it with and index. false sharing is very relevant here.

Remus Rusanu Over a year ago

But really, your best bet is to measure right now. VTune, F1 Profiler. Find the hot spots and then address them.

Ben Voigt Over a year ago

@Michael: You're asking a question about performance, and cache behavior specifically, and didn't mention that you have multiple threads? Please try to include important information in your future questions.

|

Heisenbug · Accepted Answer · 2012-06-27 13:59:52Z

1

The problem from a cache point of view isn't the way you are accessing the elements. In this case using a pointer or the array index is equivalent.

BTW Node* p[] is an array of pointer. So you could have possibly allocated your Node objects into distant memory areas. (For example using several ptr = new Node()). The best cache performance are gainable if:

Your Node are stored contiguosly into memory
Node size doesn't exceed the cache size.

answered Jun 27, 2012 at 13:59

Heisenbug

39.3k30 gold badges139 silver badges192 bronze badges

Collectives™ on Stack Overflow

array traversal vs pointer, cache efficiency aspect

2 Answers 2

7 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related