passing parameters of an kernel function as C++ struct?

Question

I want to pass the parameters of kernel function as the struct given like

struct kernel_data {
    double *A;
    double *B;
    double *C;
    const int *A_dims;
    const int *B_dims;
    int C_dims[2];
};

For the purpose I need to initialize a device pointer with cudaMalloc() but how could I init. such a struct by this function including these parameters I aim to pass. Or do I have to pass them separately?

talonmies · Accepted Answer · 2013-02-03 18:18:36Z

5

Just pass the structure by value to the kernel as you would any other argument:

struct kernel_data args;

cudaMalloc(&(args.A), sizeof(double)*.....);
cudaMalloc(&(args.B), sizeof(double)*.....);
cudaMalloc(&(args.C), sizeof(double)*.....);
cudaMalloc(&(args.A_dims), sizeof(int)*.....);
cudaMalloc(&(args.B_dims), sizeof(int)*.....);

kernel<<<....>>>(args);

There is a theoretical limit to argument list size, anything from 256 bytes to 4Kb, depending on what hardware you use, if you ever exceed it, copy the args structure to a device allocation and pass it as a pointer, or copy it to a constant memory pointer.

To initialise the arrays from the host, just use standard cudaMemcpy calls:

cudaMemcpy(args.A, hostA, sizeof(double)*....., cudaMemcpyHostToDevice);

etc.

edited Feb 3, 2013 at 18:18

answered Feb 3, 2013 at 17:52

talonmies

72.8k35 gold badges204 silver badges296 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Nicola Pezzotti · Accepted Answer · 2013-02-03 21:30:27Z

2

You can pass this structure by copy to kernel. Remember that the total size of parameter passed to kernels must not exceed 256B on pre-Fermi cards 4KB on Fermi.

So, you have to use cudaMalloc to allocate device memory, than you set pointers to device memory in your structure. Finally you pass your structure by copy to kernel.

I strongly recommend you to avoid the usage of this struc in a C++ code. Instead of

struct kernel_data {
    double *A;
    double *B;
    double *C;
    const int *A_dims;
    const int *B_dims;
    int C_dims[2];
};

you should do something like that

class DeviceData{
public:   
    DeviceData(...){//Do cudaMalloc here}
    ~DeviceData(...){//Do cudaFree here}
private:
    double *_A;
    int _dims;    
};

this class will hold data available on the device and it is exception safe. Than you can implement a wrapper that you can pass to a kernel

class DeviceDataWrapper{
public:
    __host__ DeviceDataWrapper(DeviceData& device):
        _A(device._A),
        _dims(device._dims)
        {}

    __forceinline__ __device__ double* data(){return _A;}
    __forceinline__ __device__ int dims()const{return _dims;}

private:
    double *_A;
    int _dims;  
}

and then call a kernel in this way

__global__ void myKernel(DeviceDataWrapper a, DeviceDataWrapper b, DeviceData2Wrapper c){
 //do something like a.data()[0] = 1;
}

DeviceData A,B;
DeviceData2 C;
myKernel<<< >>>(A,B,C);

edited Feb 3, 2013 at 21:30

answered Feb 3, 2013 at 15:37

Nicola Pezzotti

2,3671 gold badge17 silver badges26 bronze badges

2 Comments

talonmies Over a year ago

A copy of that structure will be much, much less that 256 bytes in size (and that limit only applies to compute 1.x capability hardware anyway).

Nicola Pezzotti Over a year ago

It was only a reminder, if you build hierarchical structure that limits could be reached very often.

Collectives™ on Stack Overflow

passing parameters of an kernel function as C++ struct?

2 Answers 2

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related