Actually I have two questions, the first one is, considering cache, which one of the following code is faster?
int a[10000][10000];
for(int i = 0; i < 10000; i++){
for(int j = 0; j < 10000; j++){
a[i][j]++;
}
}
or
int a[10000][10000];
for(int i = 0; i < 10000; i++){
for(int j = 0; j < 10000; j++){
a[j][i]++;
}
}
I am guessing the first one will be much faster since there are a lot less cache miss. And my question is if you are using OpenMP, what kind of technique will you use to optimise such a nested loop? My strategy is to divide the outer loop into 4 chunks and assign them among 4 cores, is there any better way (more cache friendly) to do it?
Thanks! Bob