I got some problems trying to parallelize an algorithm. The intention is to do some modifications to a 100x100 matrix. When I run the algorithm without openMP everything runs smoothly in about 34-35 seconds, when I parallelize on 2 threads (I need it to be with 2 threads only) it gets down to like 22 seconds but the output is wrong and I think it's a synchronization problem that I cannot fix.
Here's the code :
for (p = 0; p < sapt; p++){
memset(count,0,Nc*sizeof(int));
for (i = 0; i < N; i ++){
for (j = 0; j < N; j++){
for( m = 0; m < Nc; m++)
dist[m] = N+1;
omp_set_num_threads(2);
#pragma omp parallel for shared(configurationMatrix, dist) private(k,m) schedule(static,chunk)
for (k = 0; k < N; k++){
for (m = 0; m < N; m++){
if (i == k && j == m)
continue;
if (MAX(abs(i-k),abs(j-m)) < dist[configurationMatrix[k][m]])
dist[configurationMatrix[k][m]] = MAX(abs(i-k),abs(j-m));
}
}
int max = -1;
for(m = 0; m < Nc; m++){
if (dist[m] == N+1)
continue;
if (dist[m] > max){
max = dist[m];
configurationMatrix2[i][j] = m;
}
}
}
}
memcpy(configurationMatrix, configurationMatrix2, N*N*sizeof(int));
#pragma omp parallel for shared(count, configurationMatrix) private(i,j)
for (i = 0; i < N; i ++)
for (j = 0; j < N; j++)
count[configurationMatrix[i][j]] ++;
for (i = 0; i < Nc; i ++)
fprintf(out,"%i ", count[i]);
fprintf(out, "\n");
}
In which : sapt = 100; count -> it's a vector that holds me how many of an each element of the matrix I'm having on each step;
(EX: count[1] = 60 --> I have the element '1' 60 times in my matrix and so on)
dist --> vector that holds me max distances from element i,j of let's say value K to element k,m of same value K.
(EX: dist[1] = 10 --> distance from the element of value 1 to the furthest element of value 1)
Then I write stuff down in an output file, but again, wrong output.