What does Nvidia mean by this?
Nsis of typesize_tand specifies the number of bytes in shared memory that is dynamically allocated per block for this call in addition to the statically allocated memory; this dynamically allocated memory is used by any of the variables declared as an external array as mentioned in__shared__;Nsis an optional argument which defaults to 0;
Size of shared memory in my GPU is 48kB. For example I want to run a kernel with 4 parallel blocks, every one of them uses 12kB of shared memory.
In order to do that, should I launch the kernel this way
kernel<<< gridSize, blockSize, 12 * 1024 >>>();
or should the third argument be 48 * 1024?
12 * 1024. But if you want to run four kernels in parallel, I expect you will be using streams, so there should be also a fourth argument in your kernel calls.kernel<<< gridSize, blockSize, 12 * 1024 * 1024 >>>();12*1024, not12*1024*1024.