0

Actually I have two questions, the first one is, considering cache, which one of the following code is faster?

int a[10000][10000];
for(int i = 0; i < 10000; i++){
    for(int j = 0; j < 10000; j++){
       a[i][j]++;
    }
}

or

int a[10000][10000];
for(int i = 0; i < 10000; i++){
    for(int j = 0; j < 10000; j++){
       a[j][i]++;
    }
}

I am guessing the first one will be much faster since there are a lot less cache miss. And my question is if you are using OpenMP, what kind of technique will you use to optimise such a nested loop? My strategy is to divide the outer loop into 4 chunks and assign them among 4 cores, is there any better way (more cache friendly) to do it?

Thanks! Bob

1
  • You can test both on you platform. But I assume, 1st fastest. Also, question: Is array element must be int? Perhaps, enough short or char? By decrease element size, you will decrease memory traffic. Commented Oct 11, 2013 at 23:52

1 Answer 1

2

As maxihatop pointed out, the first one performs better because it has better cache locality.

Dividing the outer loop into chunks is a good strategy in the case like this, where the complexity of the task inside the loop is constant.

You might want to take a look at #pragma omp for schedule(static). This will evenly divide the iterations contiguously among threads. So your code should look like:

#pragma omp for schedule(static)
for (i = 0; i < 10000; i++) {
    for(j = 0; j < 10000; j++){
        a[i][j]++;
    }

Lawrence Livermore National Laboratory provides a fantastic tutorial of OpenMP. You can find more information there. https://computing.llnl.gov/tutorials/openMP/

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.