I am running a fitness function for 1024 matrices, each matrix gets its own block and is the same size. Each block has n*n threads (the dimension of the matrix) and needs to have n*n shared memory so that I can do an easy sum reduction. However, the dimension n for all the matrices is variable before runtime (ie. it can be manually changed, though always a power of 2 so the summation is simple). The problem here is that shared memory must be allocated using a constant, but I also need the value to pass to the kernel from the host. Where do I declare the dimension n so that it is visible to the CPU (for passing to the kernel) and can be used to declare the size of the shared memory (within the kernel)?
My code is structured like this:
from main.cu I call the kernel:
const int num_states = 1024
const int dimension = 4
fitness <<< num_states, dimension * dimension >>> (device_array_of_states, dimension, num_states, device_fitness_return);
and then in kernel.cu I have:
__global__ void fitness(
int *numbers,
int dimension,
int num_states,
int *fitness_return) {
__shared__ int fitness[16]; <<-- needs to be dimension * dimension
//code
}
numbers is an array representing 1024 matrices, dimension is the row and column length, num_states is 1024, fitness_return is an array with length 1024 that holds the fitness value for each matrix. In the kernel, the shared memory is hard coded with the square of dimension (so dimension is 4 in this example).
Where and how can I declare dimension so that it can be used to allocate shared memory as well as call the kernel, this way I only have to update dimension in one place? Thanks for your help.