5

I'm trying to parallelize this C++ code (computing a continuous Fourier transform of points, modeled as Dirac impulses), and this code compiles and works correctly, but it only uses 1 thread. Is there something else I need to do to get multiple threads working? This is on a Mac with 4 cores (8 threads), compiled with GCC 10.

vector<double> GetFourierImage(const Point pts[],
                               const int num_samples,
                               const int res,
                               const double freq_step) {
  vector<double> fourier_img(res*res, 0.0);
  double half_res = 0.5 * res;

  vector<int> rows(res);
  std::iota(rows.begin(), rows.end(), 0);
  std::for_each(  // Why doesn't this parallelize?
      std::execution::par_unseq,
      rows.begin(), rows.end(),
      [&](int i) {
    double y = freq_step * (i - half_res);
    for (int j = 0; j < res; j++) {
      double x = freq_step * (j - half_res);

      double fx = 0.0, fy = 0.0;
      for (int pt_idx = 0; pt_idx < num_samples; pt_idx++) {
        double dot = (x * pts[pt_idx].x) + (y * pts[pt_idx].y);
        double exp = -2.0 * M_PI * dot;
        fx += cos(exp);
        fy += sin(exp);
      }
      fourier_img[i*res + j] = sqrt((fx*fx + fy*fy) / num_samples);
    }
  });

  return fourier_img;
}
10
  • 1
    I have not looked into this for sure, but it is entirely possible that gcc has not yet implemented std::execution::par_unseq, this is what this sounds like. The symbol is defined but it does ...nothing. Commented Jan 24, 2021 at 2:09
  • 3
    par_unseq tells the compiler that it may use parallelism, but doesn't have to. Googling around, I found a mailinglist entry (2017) that talked about preliminary implementation of par_unseq on top of #pragma omp. If that implementation has been merged in gcc 10, it is likely that you have to enable OpenMP for that to work. Looking at gcc.gnu.org/gcc-9/changes.html, it looks like you need "Thread Building Blocks" (whatever that is) for std::execution Commented Jan 24, 2021 at 2:29
  • Oh, hmm, okay, thank you both. I installed Thread Building Blocks (TBB) and added -ltbb to my compile command which should link it, but it still doesn't seem to be using more than one thread... Commented Jan 24, 2021 at 2:40
  • How many items are you iterating over? t may not bother launching a new thread if the number is small. Commented Jan 24, 2021 at 3:15
  • @Andrew So HAL mentioned both OpenMP and TBB. And you only mentioned TBB. Did you not understand, miss it, or ? Commented Jan 24, 2021 at 14:20

2 Answers 2

3

In GCC 9, there was a hard dependency to TBB when using the different executions policies, if that were not present then the build would fail. That changed in GCC 10 (and present in GCC 11), where if the library was not present then the for_each would default to a sequential loop. This can be seen at https://github.com/gcc-mirror/gcc/blob/releases/gcc-10.1.0/libstdc++-v3/include/bits/c++config#L679. To fix your issue, try linking to TBB with -ltbb. This resolved the same issue you were having on Ubuntu 20.04 using GCC 11.2.

Sign up to request clarification or add additional context in comments.

Comments

0

I had the same problem on macOS. In my case, adding the path to the tbb header files to the include search path resolved the problem. For g++-11 and tbb installed with homebrew this was g++-11 -O3 -std=c++17 -I/opt/homebrew/include -o main main.cpp -ltbb; this directory contains a tbb header folder. If I do not add this flag, my code compiled but ran only single-threaded, as described by @Andrew. In my case, adding the -fopenmp flag was not necessary, but the -ltbb is, as pointed out by @Ryan H.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.