CUDA shared memory

Question

I need to know something about CUDA shared memory. Let's say I assign 50 blocks with 10 threads per block in a G80 card. Each SM processor of a G80 can handle 8 blocks simultaneously. Assume that, after doing some calculations, the shared memory is fully occupied.

What will be the values in shared memory when the next 8 new blocks arrive? Will the previous values reside there? Or will the previous values be copied to global memory and the shared memory refreshed for next 8 blocks?

can you explain a little better? i am not sure i'm following you — lurscher
– lurscher, Commented Feb 17, 2011 at 18:00
My interpretation/simplification: Each block requires all of the shared memory of one processor, ie each processor can host only one block at a time. Enough blocks are launched that the processors will host two blocks in order to complete the kernel execution. Now, looking at one processor... after it executes the first block, the shared memory has been used and may have meaningful values in it. Will these values still be there for the second block, and will they be at the same address as they were the for the previous block? — jmilloy
– jmilloy, Commented Feb 18, 2011 at 0:18

sadaf2605 · Accepted Answer · 2013-01-12 04:40:33Z

6

It states about the type qualifiers:

Variables in registers for a thread, only stays in kernel
Variables in global memory for a thread, only stays in kernel
__device__ __shared__ type variable in shared memory for a block, only stays in kernel
__device__ type variable in global memory for a grid, stays until the application exits
__device__ __constant__ type variable for a grid, stays until the application exits

thus from this reference, the answer to your question is the memory should be refreshed for the next 8 blocks if they reside in shared memory of your device.

edited Jan 12, 2013 at 4:40

sadaf2605

7,59010 gold badges68 silver badges105 bronze badges

answered Feb 17, 2011 at 18:02

ardiyu07

1,8002 gold badges18 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

small_potato · Accepted Answer · 2011-04-16 18:38:10Z

2

For kernel blocks, the execution order and SMs are randomly assigned. In that sense, even if the old value or address preserves, it is hard to keep things in track. I doubt there is even a way to do that. Communication between blocks are done via off chip memory. The latency associated with off chip memory is the performance killer, which makes gpu programming tricky. In Fermi cards, blocks share some L2 cache, but one can't alter the behavior of these caches.

answered Apr 16, 2011 at 18:38

small_potato

3,2176 gold badges41 silver badges49 bronze badges

Collectives™ on Stack Overflow

CUDA shared memory

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related