I'm trying to parallelize a part of a larger program using the C++ standard library and its execution policies. The original program uses std::accumulate to calculate sums over columns of 2d vectors (vectors of vectors) but since std::accumulate doesn't accept execution policies I'm trying to find a parallelizable alternative.
I tried switching to using std::reduce instead of std::accumulate. I'm quite new to C++, but from what I gathered from the C++ reference they should work quite similarly. However, my code does not compile after making this change. Why doesn't the modified code work and how can I fix it? Is there a better way to parallelize the sum over a column of a 2d vector (vector of vectors) using the C++ standard library? The implementation should work for both CPU and GPU parallelization.
Minimal reproducible example:
#include <iostream>
#include <vector>
#include <algorithm>
#include <numeric>
int main(int argc, char *argv[])
{
std::vector<std::vector<double>> vec = {{1,2,3}, {4,5,6}, {7,8,9}};
// working sequential version of sum over second column of vec
double res = std::accumulate(vec.begin(), vec.end(), 0.0, [&](auto sum, auto b) { return sum + b[1]; });
std::cout << res << std::endl; // prints 15 as expected
// same but with reduce, does not compile
res = std::reduce(vec.begin(), vec.end(), 0.0, [&](auto sum, auto b) { return sum + b[1]; });
std::cout << res << std::endl;
}
Trying to compile this, I get the errors
g++-10 -std=c++20 program.cpp
program.cpp:16:90: error: subscripted value is neither array nor pointer
16 | res = std::reduce(vec.begin(), vec.end(), 0.0, [&](auto sum, auto b) { return sum + b[1]; });
|
program.cpp:16:87: error: no match for ‘operator+’ (operand types are ‘std::vector<double>’ and ‘__gnu_cxx::__alloc_traits<std::allocator<double>, double>::value_type’ {aka ‘double’})
16 | res = std::reduce(vec.begin(), vec.end(), 0.0, [&](auto sum, auto b) { return sum + b[1]; });
| ~~~~^~~~
reduce, the predicate must accept any combination ofdoubleandstd::vector<double>parameters, in any order - four possible combinations in all. This is what makes it parallellizable. This would be easiest to do using a named class with four overloads ofoperator(). A lambda would need a chain ofif constexprto achieve the same effect. However, what you really want istransform_reduce, like this:res = std::transform_reduce(vec.begin(), vec.end(), 0.0, std::plus{}, [](const auto& v) { return v[1]; });Demo