1

The following snippet from a C++ function was originally written as serial code. In order to parallelize the outer loop with counter 'jC', I just added the line "#pragma omp parallel for private(jC)" . Although this naive approach has helped me many times, I doubt whether it suffices to parallelize the jC-loop, because the execution time seems to be unchanged with respect to the original code. Has anybody some suggestions to ensure the following code is effectively transformed into a (correct) parallel code?

Thanks in advance and my apologies if my question is not well posed (it is my first post at this forum).

The code snippet is:

#include "omp.h"

void  addRHS_csource_to_pcellroutine_par(
             double *srcCoeff, double *srcVal, int nPc,
             double *adata, double *bdata, int elsize
             )
{   int elamax = elsize*elsize;
    int jC;
    #pragma omp parallel for private(jC)
    for (int jC=0; jC<nPc; jC++) {
         for (int el=0; el<elamax; el++) {

              adata[el + jC*elamax]    = adata[el + jC*elamax] - srcCoeff[el + jC*elamax];
         }

         for (int el=0; el<elsize; el++) {

              bdata[el + jC*elsize]    = bdata[el + jC*elsize] + srcVal[el + jC*elsize];

         }

    }
}

Additional note: One (probably not the most elegant?) way to work around it, consists of changing to code into

void  addRHS_csource_to_pcellroutine_parFunction(int jC, int elamax,
             double *srcCoeff, double *srcVal, int nPc,
             double *adata, double *bdata, int elsize
             )
{   
    for (int el=0; el<elamax; el++) {

         adata[el + jC*elamax]    -= srcCoeff[el + jC*elamax];
    }

    for (int el=0; el<elsize; el++) {

         bdata[el + jC*elsize]  += srcVal[el + jC*elsize];

    }

}

void  addRHS_csource_to_pcellroutine_par(
             double *srcCoeff, double *srcVal, int nPc,
             double *adata, double *bdata, int elsize
             )
{   int elamax = elsize*elsize;  
    #pragma omp parallel for  
    for (int jC=0; jC<nPc; jC++) {
         addRHS_csource_to_pcellroutine_parFunction(jC, elamax, srcCoeff, srcVal, nPc, adata, bdata, elsize);
    }

}
2
  • You have two different jC variables. Also, I don't see any reason to use private(jC). Commented Nov 26, 2013 at 14:21
  • Indeed, '#pragma omp parallel for ' seems to me more reasonable to use inseated of '#pragma omp parallel for private(jC)' ... Commented Nov 26, 2013 at 14:27

1 Answer 1

1

As you can read in specifcation (on page 55) your inner loops are not parallelized. Only the outer one is.

int jC;
#pragma omp parallel for private(jC)
for (int jC=0;......

you have defined two variables named jC. What you intended to do is correct but you should decide for one solution:

int jC;
#pragma omp parallel for private(jC)
for(jC = 0;....

or

#pragma omp parallel for
for(int jC = 0;....

As for:

I doubt whether it suffices to parallelize the jC-loop, because the execution time seems to be unchanged with respect to the original code.

the sufficiency depends on the number of iterations you have to do (given by nPc) and how many threads you provide (reasonably on a quad-core 8 Threads). You can even get slower parallelizing loop. This is because the Overhead to create the new Threads is pretty high ( + some other additional stuff like manging the threads).

So you have to gain more time by parallelizing the loop than you need to create the Threads.

Hope this answers your questions.

If you still nedd a faster Programm you can think about an algorithm to parallelize the inner loops aswell (eg by splitting the iteration space and using openmp reduction construct)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.