1

I got some problems trying to parallelize an algorithm. The intention is to do some modifications to a 100x100 matrix. When I run the algorithm without openMP everything runs smoothly in about 34-35 seconds, when I parallelize on 2 threads (I need it to be with 2 threads only) it gets down to like 22 seconds but the output is wrong and I think it's a synchronization problem that I cannot fix.

Here's the code :

for (p = 0; p < sapt; p++){

    memset(count,0,Nc*sizeof(int));

    for (i = 0; i < N; i ++){
        for (j = 0; j < N; j++){

            for( m = 0; m < Nc; m++)
                dist[m] = N+1;

            omp_set_num_threads(2); 
            #pragma omp parallel for shared(configurationMatrix, dist) private(k,m) schedule(static,chunk)
            for (k = 0; k < N; k++){
                for (m = 0; m < N; m++){        

                    if (i == k && j == m)
                        continue;

                    if (MAX(abs(i-k),abs(j-m)) < dist[configurationMatrix[k][m]])
                        dist[configurationMatrix[k][m]] = MAX(abs(i-k),abs(j-m));       
                }
            }   


        int max = -1;

        for(m = 0; m < Nc; m++){

            if (dist[m] == N+1)
                continue;

            if (dist[m] > max){
                max = dist[m];
                configurationMatrix2[i][j] = m;
                }
            }           
        }
    }

    memcpy(configurationMatrix, configurationMatrix2, N*N*sizeof(int));


    #pragma omp parallel for shared(count, configurationMatrix) private(i,j)
    for (i = 0; i < N; i ++)
        for (j = 0; j < N; j++)
            count[configurationMatrix[i][j]] ++;

    for (i = 0; i < Nc; i ++)
        fprintf(out,"%i ", count[i]);
    fprintf(out, "\n"); 
}

In which : sapt = 100; count -> it's a vector that holds me how many of an each element of the matrix I'm having on each step;

(EX: count[1] = 60 --> I have the element '1' 60 times in my matrix and so on)

dist --> vector that holds me max distances from element i,j of let's say value K to element k,m of same value K.

(EX: dist[1] = 10 --> distance from the element of value 1 to the furthest element of value 1)

Then I write stuff down in an output file, but again, wrong output.

0

1 Answer 1

1

If I understand your code correctly this line

count[configurationMatrix[i][j]] ++;

increments count at the element whose index is at configurationMatrix[i][j]. I don't see that your code takes any steps to ensure that threads are not simultaneously trying to increment the same element of count. It's entirely feasible that two different elements of configurationMatrix provide the same index into count and that those two elements are handled by different threads. Since ++ is not an atomic operation your code has a data race; multiple threads can contend for update access to the same variable and you lose any guarantees of correctness, or determinism, in the result.

I think you may have other examples of the same problem in other parts of your code too. You are silent on the errors you observe in the results of the parallel program compared with the results from the serial program yet those errors are often very useful in diagnosing a problem. For example, if the results of the parallel program are not the same every time you run it, that is very suggestive of a data race somewhere in your code.

How to fix this ? Since you only have 2 threads the easiest fix would be to not parallelise this part of the program. You could wrap the data race inside an OpenMP critical section but that's really just another way of serialising your code. Finally, you could possibly modify your algorithm and data structures to avoid this problem entirely.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the heads up. I figure it out that I cannot parallelize the algorithm the way it is now and I have to think it from another perspective. Cheers!
Maybe my brain is not working today but I don't see a race condition in count[configurationMatrix[i][j]] ++; No thread ever writes to the same value of i and j.
@Zboson: consider what happens when configurationMatrix[i][j]==configurationMatrix[k][m] and i != k, j != m.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.