I am trying to parallelize a nested for loop below using allgather
for (int i=0; i<N1; i++) {
for (int j=0; j<N0; j++)
HS_1[i] += IN[j]*W0[j][i];
}
Here N1 is 1000 and N2 is 764.
I have four processes and I just want to parallelize the outer loop. Is there a way to do it?