You can pass this structure by copy to kernel.
Remember that the total size of parameter passed to kernels must not exceed 256B on pre-Fermi cards 4KB on Fermi.
So, you have to use cudaMalloc to allocate device memory, than you set pointers to device memory in your structure. Finally you pass your structure by copy to kernel.
I strongly recommend you to avoid the usage of this struc in a C++ code.
Instead of
struct kernel_data {
double *A;
double *B;
double *C;
const int *A_dims;
const int *B_dims;
int C_dims[2];
};
you should do something like that
class DeviceData{
public:
DeviceData(...){//Do cudaMalloc here}
~DeviceData(...){//Do cudaFree here}
private:
double *_A;
int _dims;
};
this class will hold data available on the device and it is exception safe.
Than you can implement a wrapper that you can pass to a kernel
class DeviceDataWrapper{
public:
__host__ DeviceDataWrapper(DeviceData& device):
_A(device._A),
_dims(device._dims)
{}
__forceinline__ __device__ double* data(){return _A;}
__forceinline__ __device__ int dims()const{return _dims;}
private:
double *_A;
int _dims;
}
and then call a kernel in this way
__global__ void myKernel(DeviceDataWrapper a, DeviceDataWrapper b, DeviceData2Wrapper c){
//do something like a.data()[0] = 1;
}
DeviceData A,B;
DeviceData2 C;
myKernel<<< >>>(A,B,C);