Passing a pointer that points to device constant memory to kernels, instead of using directly

Question

I'm using CUDA 5.0 and a GTX 670 on ubuntu 12.10 with gcc 4.6, and I have written a class called Grid:

https://github.com/benadler/octocopter/blob/master/basestation/grid.cu

https://github.com/benadler/octocopter/blob/master/basestation/grid.cuh

The Grid-class is being used in two other classes:

pointcloudcuda
particlesystem

Now I'd like to use Grid's (non-static) methods in kernels of both pointcloudcuda and particlesystem, even though they will be using different grids (different grid-objects with different values). Thus, for all the classes where I use Grid, I have two choices:

1) I simply do

Grid hostGrid(...);
cudaMalloc(gridOnDeviceGlobal, sizeof(Grid))
cudaMemcpy(gridOnDeviceGlobal, &hostGrid, sizeof(Grid), cudaMemcpyHostToDevice)
cloudKernel<<< numBlocks, numThreads >>>(someDate, gridOnDeviceGlobal);

This is simple, but the kernels will have to read the grid-values from global memory. This might be slow.

2) As the Grid-values rarely change, I put a

__constant__ Grid myGridForPointCloudCuda

into pointcloudcuda.cu, together with two functions

void copyParametersToGpu(Grid *hostGrid)
{
    cudaMemcpyToSymbolAsync(myGridForPointCloudCuda, hostGrid, sizeof(Grid))
}

void getDevicePointerOfGridForPointCloudCuda(Grid** ptr)
{
    cudaGetSymbolAddress((void**)ptr, myGridForPointCloudCuda);
}

Now, in pointcloudcuda.cpp, I can

Grid hostGrid(...);
copyParametersToGpu(&hostGrid);
Grid* gridOnDeviceConstant;
getDevicePointerOfGridForPointCloudCuda(&gridOnDeviceConstant);
cloudKernel<<< numBlocks, numThreads >>>(someDate, gridOnDeviceConstant);

The advantage of 2), in my mind, would be the faster access to constant memory in the kernels. In other places, though, I read that this won't work, because the compiler compiling the CUDA kernels doesn't know at compile-time whether the grid-pointer being passed points to global or constant memory, and thus has to use slower memory-fetch-instructions.

Will 2) be faster than 1) on a Geforce GTX 670?

Is there a better way to do what I want? I just need to pass different Grid-instances to the kernels. And before I started using multiple Grid-instances, a constant variable was a comfortable AND fast choice.

Thanks!

RoBiK · Accepted Answer · 2013-02-28 21:13:36Z

1

If you have multiple Grid instances than simply allocate a Grid Array in constant memory, copy the Grid instances into the array and when calling the kernel pass an index into the Grid Array instead of a pointer to a particular Grid instance. Inside the kernel use the index to access a particular Grid instance.

answered Feb 28, 2013 at 21:13

RoBiK

1,73012 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Passing a pointer that points to device constant memory to kernels, instead of using directly

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related