I am trying to add multi-threading in a C++ code. The target is the for loop inside the function. The objective is to reduce the execution time of the program. It takes 3.83 seconds for execution.
I have tried to add the command #pragma omp parallel for reduction(+:sum) in the inner loop (before the j for-loop) but it was not enough. It took 1.98 seconds. The aim is to decrease the time up to 0.5 seconds.
I made some research to increase the speed up and some people recommend the Strip Mining method for Vectorization for better results. However I do not know how to implement it yet.
Could someone know how to do it ?
The code is:
void filter(const long n, const long m, float *data, const float threshold, std::vector &result_row_ind) {
for (long i = 0; i < n; i++) {
float sum = 0.0f;
for (long j = 0; j < m; j++) {
sum += data[i*m + j];
}
if (sum > threshold)
result_row_ind.push_back(i);
}
std::sort(result_row_ind.begin(),
result_row_ind.end());
}
Thank you very much
n,mandresult_row_ind.size()when it takes 3.83 seconds?i*moutside of the inner loop. Second, you could parallelize this by segmenting the outer loop and passing each segment to a different thread. You then sum the results when the threads return. What version of C++ is this?