Measuring the effect of cache size and cache line size in C

Question

I have just read a blogpost here and try to do a similar thing, here is my code to check what is in example 1 and 2:

int doSomething(long numLoop,int cacheSize){
    long k;
    int arr[1000000];
    for(k=0;k<numLoop;k++){
        int i;
        for  (i = 0; i < 1000000; i+=cacheSize) arr[i] = arr[i];
    }
}

As stated in the blogpost, the execution time for doSomething(1000,2) and doSomething(1000,1) should be almost the same, but I got 2.1s and 4.3s respectively. Can anyone help me explain? Thank you.

Update 1: I have just increased the size of my array to 100 times larger

int doSomething(long numLoop,int cacheSize){
    long k;
    int * buffer;
    buffer = (int*) malloc (100000000 * sizeof(int));
    for(k=0;k<numLoop;k++){
        int i;
        for  (i = 0; i < 100000000; i+=cacheSize) buffer[i] = buffer[i];
    }
}

Unfortunately, the execution time of doSomething(10,2) and doSomething(10,1) are still much different: 3.02s and 5.65s. Can anyone test this on your machine?

Are you using the same exact processor? Different processors have different cache sizes. — Mysticial
– Mysticial, Commented Sep 27, 2012 at 0:53
try making the array bigger and see what happens, in the link, he is using 64MB you are using 1MB you may never get a direct memory access. possibly. — Keith Nicholas
– Keith Nicholas, Commented Sep 27, 2012 at 0:57
hmm, you are correct but I am expecting to see the difference between L1 and L2, L3 cache. Let me check with a larger array. — boh
– boh, Commented Sep 27, 2012 at 1:18
use valgrind --tool=cachegrind ./your program, check the Dr,Dw, then you will know the cpu cache is the main reason of your difference. — MYMNeo
– MYMNeo, Commented Sep 27, 2012 at 1:50
@navie, see this to know more about cache, stackoverflow.com/questions/11413855/… — MYMNeo
– MYMNeo, Commented Sep 27, 2012 at 2:41

Ben Jackson · Accepted Answer · 2012-09-27 02:00:44Z

Your array size of 4M is not big enough. The entire array fits in the cache (and is in the cache after the first k loop) so the timing is dominated by instruction execution. If you make arr much bigger than the cache size you will start to see the expected effect.

(You will see an additional effect when you make arr bigger than the cache: Runtime should increase linearly with arr size until you exceed the cache, when you will see a knee in performance and it will suddenly get worse and runtime will increase on a new linear scale)

Edit: I tried your second version with the following changes:

Change to volatile int *buffer to ensure buffer[i] = buffer[i] is not optimized away.
Compile with -O2 to ensure the loop is optimized sufficiently to prevent loop overhead from dominating.

When I try that I get almost identical times:

kronos /tmp $ time ./dos 2
./dos 2  1.65s user 0.29s system 99% cpu 1.947 total
kronos /tmp $ time ./dos 1
./dos 1  1.68s user 0.25s system 99% cpu 1.926 total

Here you can see the effects of making the stride two full cachelines:

kronos /tmp $ time ./dos 16
./dos 16  1.65s user 0.28s system 99% cpu 1.926 total
kronos /tmp $ time ./dos 32
./dos 32  1.06s user 0.30s system 99% cpu 1.356 total

Richard Chambers · Accepted Answer · 2012-09-27 01:17:58Z

2

With doSomething(1000, 2) you are doing the innner loop half the count of when you use doSomething(1000,1).

Your inner loop increments by cacheSize so a value of 2 gives you only half the iterations of a value of 1. The array is sized around 4MB which is a typical virtual memory page size.

Actually I am a bit surprised that a good compiler just does not optimize this loop away since there is no variable change. That might be part of the issue.

edited Sep 27, 2012 at 1:17

answered Sep 27, 2012 at 1:06

Richard Chambers

17.9k4 gold badges96 silver badges121 bronze badges

Collectives™ on Stack Overflow

Measuring the effect of cache size and cache line size in C

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related