Why is my parallel std::for_each only using 1 thread?

Question

I'm trying to parallelize this C++ code (computing a continuous Fourier transform of points, modeled as Dirac impulses), and this code compiles and works correctly, but it only uses 1 thread. Is there something else I need to do to get multiple threads working? This is on a Mac with 4 cores (8 threads), compiled with GCC 10.

vector<double> GetFourierImage(const Point pts[],
                               const int num_samples,
                               const int res,
                               const double freq_step) {
  vector<double> fourier_img(res*res, 0.0);
  double half_res = 0.5 * res;

  vector<int> rows(res);
  std::iota(rows.begin(), rows.end(), 0);
  std::for_each(  // Why doesn't this parallelize?
      std::execution::par_unseq,
      rows.begin(), rows.end(),
      [&](int i) {
    double y = freq_step * (i - half_res);
    for (int j = 0; j < res; j++) {
      double x = freq_step * (j - half_res);

      double fx = 0.0, fy = 0.0;
      for (int pt_idx = 0; pt_idx < num_samples; pt_idx++) {
        double dot = (x * pts[pt_idx].x) + (y * pts[pt_idx].y);
        double exp = -2.0 * M_PI * dot;
        fx += cos(exp);
        fy += sin(exp);
      }
      fourier_img[i*res + j] = sqrt((fx*fx + fy*fy) / num_samples);
    }
  });

  return fourier_img;
}

I have not looked into this for sure, but it is entirely possible that gcc has not yet implemented std::execution::par_unseq, this is what this sounds like. The symbol is defined but it does ...nothing. — Sam Varshavchik
– Sam Varshavchik, Commented Jan 24, 2021 at 2:09
par_unseq tells the compiler that it may use parallelism, but doesn't have to. Googling around, I found a mailinglist entry (2017) that talked about preliminary implementation of par_unseq on top of #pragma omp. If that implementation has been merged in gcc 10, it is likely that you have to enable OpenMP for that to work. Looking at gcc.gnu.org/gcc-9/changes.html, it looks like you need "Thread Building Blocks" (whatever that is) for std::execution — HAL9000
– HAL9000, Commented Jan 24, 2021 at 2:29
Oh, hmm, okay, thank you both. I installed Thread Building Blocks (TBB) and added -ltbb to my compile command which should link it, but it still doesn't seem to be using more than one thread... — Andrew
– Andrew, Commented Jan 24, 2021 at 2:40
How many items are you iterating over? t may not bother launching a new thread if the number is small. — Galik
– Galik, Commented Jan 24, 2021 at 3:15
@Andrew So HAL mentioned both OpenMP and TBB. And you only mentioned TBB. Did you not understand, miss it, or ? — Yakk - Adam Nevraumont
– Yakk - Adam Nevraumont, Commented Jan 24, 2021 at 14:20

Ryan H · Accepted Answer · 2021-07-31 08:36:14Z

3

In GCC 9, there was a hard dependency to TBB when using the different executions policies, if that were not present then the build would fail. That changed in GCC 10 (and present in GCC 11), where if the library was not present then the for_each would default to a sequential loop. This can be seen at https://github.com/gcc-mirror/gcc/blob/releases/gcc-10.1.0/libstdc++-v3/include/bits/c++config#L679. To fix your issue, try linking to TBB with -ltbb. This resolved the same issue you were having on Ubuntu 20.04 using GCC 11.2.

edited Jul 31, 2021 at 8:36

answered Jul 31, 2021 at 8:25

Ryan H

314 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mismou · Accepted Answer · 2022-04-24 16:20:51Z

0

I had the same problem on macOS. In my case, adding the path to the tbb header files to the include search path resolved the problem. For g++-11 and tbb installed with homebrew this was g++-11 -O3 -std=c++17 -I/opt/homebrew/include -o main main.cpp -ltbb; this directory contains a tbb header folder. If I do not add this flag, my code compiled but ran only single-threaded, as described by @Andrew. In my case, adding the -fopenmp flag was not necessary, but the -ltbb is, as pointed out by @Ryan H.

answered Apr 24, 2022 at 16:20

mismou

12 bronze badges

Collectives™ on Stack Overflow

Why is my parallel std::for_each only using 1 thread?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related