I have a particular set of loops for building a matrix and I was wondering if anyone knows of a way to fuse them so I can use OpenMP parallel for pragma.
for( int i = 0; i < nbas*nchannels; ++i){
int ket_r = i / nchannels;
int ket_l = i % nchannels;
for( int j = 0; j < nbas*nchannels; ++j){
int bra_r = j / nchannels;
int bra_l = j % nchannels;
...stuff_calculated...
matrix[ i*nbas*nchannels + j ] = stuff_calculated;
}
}
Each for loop walks along a dimension I need to further divide (to get variables like ket_r).
It looks like the loops are independent to me so I would thought this would be as simple as
#pragma parallel for collapse(2)
for( int i = 0; i < nbas*nchannels; ++i){
for( int j = 0; j < nbas*nchannels; ++j){
int ket_r = i / nchannels;
int ket_l = i % nchannels;
int bra_r = j / nchannels;
int bra_l = j % nchannels;
...stuff_calculated...
matrix[ i*nbas*nchannels + j ] = stuff_calculated;
}
}
But I get different answers. If this looks correct, I will do further investigation down this route.
I cannot provide the full code because it's part of my research group so I apologize for not posting a full working question.