Try to find solution for this: Cuda Kernel uses several device functions, some of them need to return array.
I try to do this:
__device__ float *MatProd2dWxC(float *a2d, float *b2d, int mGl, int nGl)
{
int aRows = mGl; int aCols = nGl;
int bRows = nGl; int bCols = 1;
float *result;
//result.resize(mGl*aRows);
for (int i = 0; i < aRows; ++i) // each row of a
for (int j = 0; j < bCols; ++j) // each col of b
for (int k = 0; k < aCols; ++k)
result[i*mGl + j] += a2d[i*mGl + k] * b2d[k*mGl + j];
return result;
}
Don't compile this because understand that pointer in function name is not good idea. But how to do right, as idea to include additional temp array and change function to void. but then I need to use it many times in kernel code, look for more elegant solution.