0

I have a few nested loops and I put the first one in parallel mode. apar and mpar are structs whose values are modified in the loop and then function breakLogic is called which generates a struct which i store in a pre created vector of those structs. one, two ... have been declared earlier in the function.

I have tried to include ordered and critical to ensure accuracy but i am still getting incorrect results.

#pragma omp parallel for ordered private(appFlip, atur, apar, mpar, i, j, k, l, m, n) shared(rawFlip)
for(i=0; i<oneL; i++)
    {
         initialize mpar
         #pragma omp critical
         apar.one = one[i];
         for(j=0; j<twoL; j++)
         {
             apar.two = two[j];
             for(k=0; k<threeL; k++)
             {
                  apar.three = floor(three[k]*apar.two);
                  appFlip = applyParamSin(rawFlip, apar);
                  for(l=0; l< fourL; l++)
                  {
                      mpar.four = four[l];
                      for(m=0; m<fiveL; m++)
                      {
                          mpar.five = five[m];
                          for(n=0; n<sixL; n++)
                          {
                              mpar.six = add[n];
                              atur = breakLogic(appFlip,  mpar, dt);
                              #pragma omp ordered
                              {
                                  sinResVec[itr] = atur;
                                  itr++;
                              }
                          }
                      }
                  }
                  r0(appFlip);
              }
         }
    }

Or is this code not conducive for parallelism? Are there any tools for g++ which can profile code for parallel processing and indicate potential issues?

This modified code works but gives no performance improvement.

5
  • Note that there are no conditionals in your code, therefore correct itr values can be computed directly instead of using increments in the innermost loop, thus you could get rid of ordered. Then you also need to make apar and mpar private, unless there are members of those structures that are shared between threads. With private variables you can also get rid of the critical constructs. Note that the outermost critical protects the entire loop and therefore the inner critical's are superficial. Commented Oct 24, 2013 at 12:14
  • do i need to make l, m, n, o also private? Commented Oct 24, 2013 at 12:17
  • r0 is for dereferencing appFlip Commented Oct 24, 2013 at 13:05
  • If you're using g++ then define all your variables when you use them (e.g. for(int i=0; ...) and don't worry about explicitly declaring everything public and private. That's only for people still using ANSI/gnu89 C. Just remember that everything defined inside the parallel construction is private and everything outside is shared. It will make your code a lot cleaner and personally I think easier to understand. And don't compare performance without optimization on. Commented Oct 24, 2013 at 14:46
  • what do you mean by "don't compare performance without optimization on"? Isn't the whole point of parallelism performance improvement? Commented Oct 24, 2013 at 14:53

1 Answer 1

1

You original code can be paralleled by a few modifications.

  • set apar and mpar as firstprivate. apar and mpar should be thread local variables and be initialized when entering the parallel for region;

  • remove all critical and ordered clauses, including the one in the parallel for directive. they are not working as your expected;

  • calculate iter with i,j,k,l,m,n to remove the dependency.

.

iter=(((i*twoL+j)*threeL+k)*fourL+m)*fiveL+n;
sinResVec[itr] = atur;

update

See here for more details of OpenMP, especially the differences between private and firstprivate.

http://msdn.microsoft.com/en-us/library/tt15eb9t.aspx

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks Eric! Slight error/type on my part, apar.three is dependent on apar.two. Does this change anything?
and keep all i...n as private in parallel construct?
another question? you haven't mentioned anything about atur, appFlip etc. does keeping them private or firstprivate make any difference?
keeping them private as your existing code is fine. You could set them to firstprivate only if you want to initialize them when entering the parallel for region. Variables like j are explicitly initialized in the code, so firstprivate it is useless.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.