The following snippet from a C++ function was originally written as serial code. In order to parallelize the outer loop with counter 'jC', I just added the line "#pragma omp parallel for private(jC)" . Although this naive approach has helped me many times, I doubt whether it suffices to parallelize the jC-loop, because the execution time seems to be unchanged with respect to the original code. Has anybody some suggestions to ensure the following code is effectively transformed into a (correct) parallel code?
Thanks in advance and my apologies if my question is not well posed (it is my first post at this forum).
The code snippet is:
#include "omp.h"
void addRHS_csource_to_pcellroutine_par(
double *srcCoeff, double *srcVal, int nPc,
double *adata, double *bdata, int elsize
)
{ int elamax = elsize*elsize;
int jC;
#pragma omp parallel for private(jC)
for (int jC=0; jC<nPc; jC++) {
for (int el=0; el<elamax; el++) {
adata[el + jC*elamax] = adata[el + jC*elamax] - srcCoeff[el + jC*elamax];
}
for (int el=0; el<elsize; el++) {
bdata[el + jC*elsize] = bdata[el + jC*elsize] + srcVal[el + jC*elsize];
}
}
}
Additional note: One (probably not the most elegant?) way to work around it, consists of changing to code into
void addRHS_csource_to_pcellroutine_parFunction(int jC, int elamax,
double *srcCoeff, double *srcVal, int nPc,
double *adata, double *bdata, int elsize
)
{
for (int el=0; el<elamax; el++) {
adata[el + jC*elamax] -= srcCoeff[el + jC*elamax];
}
for (int el=0; el<elsize; el++) {
bdata[el + jC*elsize] += srcVal[el + jC*elsize];
}
}
void addRHS_csource_to_pcellroutine_par(
double *srcCoeff, double *srcVal, int nPc,
double *adata, double *bdata, int elsize
)
{ int elamax = elsize*elsize;
#pragma omp parallel for
for (int jC=0; jC<nPc; jC++) {
addRHS_csource_to_pcellroutine_parFunction(jC, elamax, srcCoeff, srcVal, nPc, adata, bdata, elsize);
}
}