I am reading a paragraph about the tbb::parallel_scan algorithm from the book Intel Threading Building Blocks, and I understood what the operation does serially, but I am not understanding what are the requirements on the body object, the description in the book is incredibly vague, saying that the algorithm can perform 2 passes over the input data. They mention an assign operation and a reverse_join operation.
I am trying to understand when these operations are applied and how they work.
This is the body object, fed to parallel_scan:
class Body {
T sum;
T* const y;
const T* const z;
public:
Body( T y_[], const T z_[] ) : sum(id), z(z_), y(y_) {}
T get_sum() const { return sum; }
template<typename Tag>
void operator()( const oneapi::tbb::blocked_range<int>& r, Tag ) {
T temp = sum;
for( int i=r.begin(); i<r.end(); ++i ) {
temp = temp + z[i];
if( Tag::is_final_scan() )
y[i] = temp;
}
sum = temp;
}
Body( Body& b, oneapi::tbb::split ) : z(b.z), y(b.y), sum(id) {}
void reverse_join( Body& a ) { sum = a.sum + sum; }
void assign( Body& b ) { sum = b.sum; }
};
So for each block, they first compute the sum of all the elements and accumulate it in sum (for each block starting from the identity), is this the famous first pass? Then what happens? is assign called to pass the result to the adjacent block? When is the second pass? When is reverse_join called?
tbb::parallel_scanand how to use it. There is no code in the book that shows how the algorithm works, just that it takes ablocked_rangeand a body object as the one I included as inputs. And to me it's unclear what the necessary methods one has to provide actually do. This is very similar to what you get in the book: oneapi-spec.uxlfoundation.org/specifications/oneapi/v1.1-rev-1/…