0

What does Nvidia mean by this?

Ns is of type size_t and specifies the number of bytes in shared memory that is dynamically allocated per block for this call in addition to the statically allocated memory; this dynamically allocated memory is used by any of the variables declared as an external array as mentioned in __shared__; Ns is an optional argument which defaults to 0;

Size of shared memory in my GPU is 48kB. For example I want to run a kernel with 4 parallel blocks, every one of them uses 12kB of shared memory.

In order to do that, should I launch the kernel this way

kernel<<< gridSize, blockSize, 12 * 1024 >>>();

or should the third argument be 48 * 1024?

3
  • 1
    The third argument specifies how much dynamically allocated shared memory is used by the kernel, so the third argument should be 12 * 1024. But if you want to run four kernels in parallel, I expect you will be using streams, so there should be also a fourth argument in your kernel calls. Commented Feb 25, 2015 at 20:19
  • Thanks! By the way I meant not '4 kernels at the same time', but '1 kernel on 4 threadblocks at the same time'. I guess I still need to use kernel<<< gridSize, blockSize, 12 * 1024 * 1024 >>>(); Commented Feb 26, 2015 at 1:15
  • I assume you mean 12*1024, not 12*1024*1024. Commented Feb 27, 2015 at 20:54

1 Answer 1

1

Ns in a size in bytes. If you want to reserve 12kB of shared memory you would do 12*1024*1024.

I doubt you want to do this. Ns value is PER BLOCK. So it is the amount of shared memory per block executing on the device. I'm guessing you'd like to do something around the lines of 12*1024*1024/number_of_blocks;

Kernel launching with concurrency: If as mentioned in a comment, you are using streams there is a fourth input in the kernel launch which is the cuda stream.

If you want to launch a kernel on another stream without any shared memory it will look like:

kernel_name<<<128, 128, 0, mystream>>>(...);

but concurrency is a whole different issue.

Sign up to request clarification or add additional context in comments.

1 Comment

Yep, you're a right, I wanted 12 * 1024 * 1024, It was kind of misprint. Thank you for your answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.