I am using C++17 parallel standard library algorithms with the std::execution::par execution policy. I am using Ubuntu on a laptop with 4 cores, clang 11 compiler and cmake extension for VS Code for the build (although I also checked with a simple single command line compilation without using cmake).
Based on the following observations, it seems the program only uses 1 thread:
- Run time is the same as with using
std::execution::seq(regular, sequential algorithm) - Using
top -HI see only 1 thread with ~100% cpu usage - Using Ubuntu's system monitor I see one core active during execution (but the active core may change between different calls to
sortif I do repeats using a for loop).
Code example:
#include <vector>
#include <iostream>
#include <algorithm>
#include <execution>
#include <chrono>
#include <thread>
int main()
{
const int N = 10000000;
std::vector<int> vec(N);
std::chrono::duration<double> elapsed;
unsigned int nThreads = std::thread::hardware_concurrency();
std::cout << "number of available threads: " << nThreads << "\n"; // this prints "4"
auto tstart = std::chrono::high_resolution_clock::now();
std::generate(vec.begin(), vec.end(), []() {return rand() % 100;});
//std::sort(std::execution::seq, vec.begin(), vec.end());
std::sort(std::execution::par, vec.begin(), vec.end());
auto tfinish = std::chrono::high_resolution_clock::now();
elapsed = tfinish - tstart;
std::cout << "Elapsed time: " << elapsed.count() << std::endl;
return 0;
}
I thought that maybe the problem was that I didn't tell cmake to link to pthread library. So I changed CMakeLists.txt:
project(my_proj LANGUAGES C CXX)
find_package (Threads)
target_link_libraries (my_proj ${CMAKE_THREAD_LIBS_INIT})
But it didn't make any change.
Why it doesn't seem to run in parallel?
-ltbb) help? (Referencing 'note 3' here, and assuming you're using libstdc++ since libc++ wasn't mentioned.)-ltbbdid not make a change (I verified withlddthat it is indeed linked to this library)libstdc++.so.6