I have a few nested loops and I put the first one in parallel mode. apar and mpar are structs whose values are modified in the loop and then function breakLogic is called which generates a struct which i store in a pre created vector of those structs.
one, two ... have been declared earlier in the function.
I have tried to include ordered and critical to ensure accuracy but i am still getting incorrect results.
#pragma omp parallel for ordered private(appFlip, atur, apar, mpar, i, j, k, l, m, n) shared(rawFlip)
for(i=0; i<oneL; i++)
{
initialize mpar
#pragma omp critical
apar.one = one[i];
for(j=0; j<twoL; j++)
{
apar.two = two[j];
for(k=0; k<threeL; k++)
{
apar.three = floor(three[k]*apar.two);
appFlip = applyParamSin(rawFlip, apar);
for(l=0; l< fourL; l++)
{
mpar.four = four[l];
for(m=0; m<fiveL; m++)
{
mpar.five = five[m];
for(n=0; n<sixL; n++)
{
mpar.six = add[n];
atur = breakLogic(appFlip, mpar, dt);
#pragma omp ordered
{
sinResVec[itr] = atur;
itr++;
}
}
}
}
r0(appFlip);
}
}
}
Or is this code not conducive for parallelism? Are there any tools for g++ which can profile code for parallel processing and indicate potential issues?
This modified code works but gives no performance improvement.
itrvalues can be computed directly instead of using increments in the innermost loop, thus you could get rid ofordered. Then you also need to makeaparandmparprivate, unless there are members of those structures that are shared between threads. With private variables you can also get rid of thecriticalconstructs. Note that the outermostcriticalprotects the entire loop and therefore the innercritical's are superficial.