Currently I am using C++ 11 async feature to create additional thread to run my computing kernel. The computing kernel are totally independent with each other. I want to know 2 things.
- Is this computing model suitable for using GPU to optimise?
- If question 1 is true, what is the basic practice for this kind of optimisation?
Pseudocode code is as below:
vector<std::future<ResultType>> futureVector;
for (int i = 0; i < std::thread::hardware_concurrency(); i ++) {
auto future = std::async(
std::launch::async,
&computingKernel,
this,
parameter1,
parameter2);
futureVector.push_back(move(future));
}
for (int i = 0 ; i < futureVector.size(); i++) {
// Get result
futureVector[i].get();
}
Addition:
- Is there a way to move this easily without changing the whole code? Like a program mark that could start threads on GPU
computingKernel. Despite what you might imagine, GPUs don't run threads in anything like the way that pseudocode assumes