I'm trying to parallelize this C++ code (computing a continuous Fourier transform of points, modeled as Dirac impulses), and this code compiles and works correctly, but it only uses 1 thread. Is there something else I need to do to get multiple threads working? This is on a Mac with 4 cores (8 threads), compiled with GCC 10.
vector<double> GetFourierImage(const Point pts[],
const int num_samples,
const int res,
const double freq_step) {
vector<double> fourier_img(res*res, 0.0);
double half_res = 0.5 * res;
vector<int> rows(res);
std::iota(rows.begin(), rows.end(), 0);
std::for_each( // Why doesn't this parallelize?
std::execution::par_unseq,
rows.begin(), rows.end(),
[&](int i) {
double y = freq_step * (i - half_res);
for (int j = 0; j < res; j++) {
double x = freq_step * (j - half_res);
double fx = 0.0, fy = 0.0;
for (int pt_idx = 0; pt_idx < num_samples; pt_idx++) {
double dot = (x * pts[pt_idx].x) + (y * pts[pt_idx].y);
double exp = -2.0 * M_PI * dot;
fx += cos(exp);
fy += sin(exp);
}
fourier_img[i*res + j] = sqrt((fx*fx + fy*fy) / num_samples);
}
});
return fourier_img;
}
std::execution::par_unseq, this is what this sounds like. The symbol is defined but it does ...nothing.par_unseqtells the compiler that it may use parallelism, but doesn't have to. Googling around, I found a mailinglist entry (2017) that talked about preliminary implementation ofpar_unseqon top of#pragma omp. If that implementation has been merged in gcc 10, it is likely that you have to enable OpenMP for that to work. Looking at gcc.gnu.org/gcc-9/changes.html, it looks like you need "Thread Building Blocks" (whatever that is) forstd::execution