I'm using CUDA 5.0 and a GTX 670 on ubuntu 12.10 with gcc 4.6, and I have written a class called Grid:
https://github.com/benadler/octocopter/blob/master/basestation/grid.cu
https://github.com/benadler/octocopter/blob/master/basestation/grid.cuh
The Grid-class is being used in two other classes:
- pointcloudcuda
- particlesystem
Now I'd like to use Grid's (non-static) methods in kernels of both pointcloudcuda and particlesystem, even though they will be using different grids (different grid-objects with different values). Thus, for all the classes where I use Grid, I have two choices:
1) I simply do
Grid hostGrid(...);
cudaMalloc(gridOnDeviceGlobal, sizeof(Grid))
cudaMemcpy(gridOnDeviceGlobal, &hostGrid, sizeof(Grid), cudaMemcpyHostToDevice)
cloudKernel<<< numBlocks, numThreads >>>(someDate, gridOnDeviceGlobal);
This is simple, but the kernels will have to read the grid-values from global memory. This might be slow.
2) As the Grid-values rarely change, I put a
__constant__ Grid myGridForPointCloudCuda
into pointcloudcuda.cu, together with two functions
void copyParametersToGpu(Grid *hostGrid)
{
cudaMemcpyToSymbolAsync(myGridForPointCloudCuda, hostGrid, sizeof(Grid))
}
void getDevicePointerOfGridForPointCloudCuda(Grid** ptr)
{
cudaGetSymbolAddress((void**)ptr, myGridForPointCloudCuda);
}
Now, in pointcloudcuda.cpp, I can
Grid hostGrid(...);
copyParametersToGpu(&hostGrid);
Grid* gridOnDeviceConstant;
getDevicePointerOfGridForPointCloudCuda(&gridOnDeviceConstant);
cloudKernel<<< numBlocks, numThreads >>>(someDate, gridOnDeviceConstant);
The advantage of 2), in my mind, would be the faster access to constant memory in the kernels. In other places, though, I read that this won't work, because the compiler compiling the CUDA kernels doesn't know at compile-time whether the grid-pointer being passed points to global or constant memory, and thus has to use slower memory-fetch-instructions.
Will 2) be faster than 1) on a Geforce GTX 670?
Is there a better way to do what I want? I just need to pass different Grid-instances to the kernels. And before I started using multiple Grid-instances, a constant variable was a comfortable AND fast choice.
Thanks!