Kernel parameter passing in CUDA?

Question

I have a newbie doubt regarding how do CUDA kernels work.

If have the following code (which use the function cuPrintf taken from here):

#include "cuPrintf.cu"

__global__ void testKernel(int param){
    cuPrintf("Param value: %d\n", param);
}

int main(void){

    // initialize cuPrintf
    cudaPrintfInit();

    int a = 456;    

    testKernel<<<4,1>>>(a);

    // display the device's greeting
    cudaPrintfDisplay();

    // clean up after cuPrintf
    cudaPrintfEnd();
}

The output of the execution is:

Param value: 456
Param value: 456
Param value: 456
Param value: 456

I cannot get how the kernel can read the correct value of the parameter I pass, isn't it allocated in the host memory? Can the GPU read from the host memory?

Thanks,

Andrea

Function variables are implicitly marshalled and copied to the device by NVCC compiler. — Buddhika
– Buddhika, Commented Feb 26, 2018 at 3:37

phil · Accepted Answer · 2014-01-21 02:04:57Z

22

According to the section E.2.5.2. Function Parameters in CUDA C Programming Guide

__global__ function parameters are passed to the device:

via shared memory and are limited to 256 bytes on devices of compute capability 1.x,
via constant memory and are limited to 4 KB on devices of compute capability 2.x and higher.

answered Jan 21, 2014 at 2:04

phil

2,5881 gold badge21 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Artjom B. · Accepted Answer · 2014-06-26 09:57:07Z

15

The declaration void testKernel(int param) says that param is passed by value, not by reference. In other words, the stack contains a copy of a's value, not a pointer to a. CUDA copies the stack to the kernel running on the GPU.

edited Jun 26, 2014 at 9:57

Artjom B.

62k26 gold badges137 silver badges236 bronze badges

answered Jun 27, 2011 at 21:24

Jesse Hall

6,84727 silver badges30 bronze badges

3 Comments

Andrea Over a year ago

So, if I correctly understood, during the kernel execution param will be into the GPU memory stack. Thus, if I read it many time, I will not access any "host memory", decreasing performance, right?

Jesse Hall Over a year ago

The documentation isn't really clear on where kernel arguments are stored. You can assume it's in fast memory though: registers, shared, or constant. As kirbuchi said in his answer, the CUDA Programming Guide (section B.13 in the 3.0 Guide) says "The arguments to the execution configuration are evaluated before the actual function arguments and like the function arguments, are currently passed via shared memory to the device".

Andrea Over a year ago

Perfect. Thanks again for all your answers.

kirbuchi · Accepted Answer · 2013-02-07 23:50:34Z

4

According to the CUDA Programming Guide (Appendix B.16) the arguments are passed via shared memory to the device.

The arguments to the execution configuration are evaluated before the actual function arguments and like the function arguments, are currently passed via shared memory to the device.

edited Feb 7, 2013 at 23:50

answered Jun 27, 2011 at 21:26

kirbuchi

2,3142 gold badges23 silver badges35 bronze badges

4 Comments

Steve Fallows Over a year ago

That the quote refers to the execution config arguments (4 and 1) not the function arguments (a == 456) which is what the OP asked about.

kirbuchi Over a year ago

@Steve You're right, but it compares them to the function arguments which it says are also passed via shared memory. Should've emphasized that part :)

Greg Smith Over a year ago

The method for passing kernel parameters varies with architecture. Compute capability 1.* devices put the values in shared memory. Compute capability >= 2.0 put the values in constant memory.

TripleS Over a year ago

That's doesn't really make scene , cause you can modify each argument, let's say ar is a int argument, therefore - you my write: at = threadIdx.x; which doesn't make scene cause you modify the solely shared value for all threads in a block.

Mikola · Accepted Answer · 2011-06-27 21:22:48Z

0

The parameters are passed to the kernels when you invoke them; otherwise how else would you communicate with the GPU? It is the same as the idea behind setting a uniform in a shader.

answered Jun 27, 2011 at 21:22

Mikola

9,3542 gold badges36 silver badges42 bronze badges

Comments

Buddhika · Accepted Answer · 2018-02-26 03:34:11Z

0

In the runtime API, parameters for global functions are implicitly marshalled and copied from the host to the device.

NVCC compiler generates code that hides the marshalling from you. You can find the Parameter sizes and limitations in the CUDA Programming Guide

answered Feb 26, 2018 at 3:34

Buddhika

4172 silver badges12 bronze badges

Collectives™ on Stack Overflow

Kernel parameter passing in CUDA?

5 Answers 5

Comments

3 Comments

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

3 Comments

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related