I'm having trouble parallelising my monte carlo method to calculate pi. Here is the parallelised for-loop:
#pragma omp parallel for private(i,x,y) schedule(static) reduction(+:count)
for (i = 0; i < points; i++) {
x = rand()/(RAND_MAX+1.0)*2 - 1.0;
y = rand()/(RAND_MAX+1.0)*2 - 1.0;
// Check if point lies in circle
if(x*x + y*y < 1.0) { count++; }
}
The problem is, it underestimates pi if I use schedule(static), and its slower than the serial implementation if I use schedule(dynamic). What am I doing wrong? I've tried other ways to fix it (like this: Using OpenMP to calculate the value of PI) but it's still much slower than the serial implementation.
Thanks in advance
rand()thread-safe?randwill normally have an internal "seed", which is basically acting as a shared resource, forcing serialization at every call (or else risking incorrect results). If you have it available, I'd try usingrand_ror (preferably)drand48_rinstead. Alternatively, consider the random number generation classes introduced in C++11 -- each instance has its own state, which should avoid serialization (but can make initialization tricky -- multiple threads creating identical sequences will do little good).