Parallelizing giving wrong output

Question

I got some problems trying to parallelize an algorithm. The intention is to do some modifications to a 100x100 matrix. When I run the algorithm without openMP everything runs smoothly in about 34-35 seconds, when I parallelize on 2 threads (I need it to be with 2 threads only) it gets down to like 22 seconds but the output is wrong and I think it's a synchronization problem that I cannot fix.

Here's the code :

for (p = 0; p < sapt; p++){

    memset(count,0,Nc*sizeof(int));

    for (i = 0; i < N; i ++){
        for (j = 0; j < N; j++){

            for( m = 0; m < Nc; m++)
                dist[m] = N+1;

            omp_set_num_threads(2); 
            #pragma omp parallel for shared(configurationMatrix, dist) private(k,m) schedule(static,chunk)
            for (k = 0; k < N; k++){
                for (m = 0; m < N; m++){        

                    if (i == k && j == m)
                        continue;

                    if (MAX(abs(i-k),abs(j-m)) < dist[configurationMatrix[k][m]])
                        dist[configurationMatrix[k][m]] = MAX(abs(i-k),abs(j-m));       
                }
            }   


        int max = -1;

        for(m = 0; m < Nc; m++){

            if (dist[m] == N+1)
                continue;

            if (dist[m] > max){
                max = dist[m];
                configurationMatrix2[i][j] = m;
                }
            }           
        }
    }

    memcpy(configurationMatrix, configurationMatrix2, N*N*sizeof(int));


    #pragma omp parallel for shared(count, configurationMatrix) private(i,j)
    for (i = 0; i < N; i ++)
        for (j = 0; j < N; j++)
            count[configurationMatrix[i][j]] ++;

    for (i = 0; i < Nc; i ++)
        fprintf(out,"%i ", count[i]);
    fprintf(out, "\n"); 
}

In which : sapt = 100; count -> it's a vector that holds me how many of an each element of the matrix I'm having on each step;

(EX: count[1] = 60 --> I have the element '1' 60 times in my matrix and so on)

dist --> vector that holds me max distances from element i,j of let's say value K to element k,m of same value K.

(EX: dist[1] = 10 --> distance from the element of value 1 to the furthest element of value 1)

Then I write stuff down in an output file, but again, wrong output.

High Performance Mark · Accepted Answer · 2013-11-11 09:02:04Z

1

If I understand your code correctly this line

count[configurationMatrix[i][j]] ++;

increments count at the element whose index is at configurationMatrix[i][j]. I don't see that your code takes any steps to ensure that threads are not simultaneously trying to increment the same element of count. It's entirely feasible that two different elements of configurationMatrix provide the same index into count and that those two elements are handled by different threads. Since ++ is not an atomic operation your code has a data race; multiple threads can contend for update access to the same variable and you lose any guarantees of correctness, or determinism, in the result.

I think you may have other examples of the same problem in other parts of your code too. You are silent on the errors you observe in the results of the parallel program compared with the results from the serial program yet those errors are often very useful in diagnosing a problem. For example, if the results of the parallel program are not the same every time you run it, that is very suggestive of a data race somewhere in your code.

How to fix this ? Since you only have 2 threads the easiest fix would be to not parallelise this part of the program. You could wrap the data race inside an OpenMP critical section but that's really just another way of serialising your code. Finally, you could possibly modify your algorithm and data structures to avoid this problem entirely.

edited Nov 11, 2013 at 9:02

answered Nov 10, 2013 at 21:04

High Performance Mark

78.7k7 gold badges109 silver badges168 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Pavel Daniel Over a year ago

Thanks for the heads up. I figure it out that I cannot parallelize the algorithm the way it is now and I have to think it from another perspective. Cheers!

Z boson Over a year ago

Maybe my brain is not working today but I don't see a race condition in count[configurationMatrix[i][j]] ++; No thread ever writes to the same value of i and j.

High Performance Mark Over a year ago

@Zboson: consider what happens when configurationMatrix[i][j]==configurationMatrix[k][m] and i != k, j != m.

Collectives™ on Stack Overflow

Parallelizing giving wrong output

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related