0

I'm using CUDA 5.0 and a GTX 670 on ubuntu 12.10 with gcc 4.6, and I have written a class called Grid:

https://github.com/benadler/octocopter/blob/master/basestation/grid.cu

https://github.com/benadler/octocopter/blob/master/basestation/grid.cuh

The Grid-class is being used in two other classes:

  • pointcloudcuda
  • particlesystem

Now I'd like to use Grid's (non-static) methods in kernels of both pointcloudcuda and particlesystem, even though they will be using different grids (different grid-objects with different values). Thus, for all the classes where I use Grid, I have two choices:

1) I simply do

Grid hostGrid(...);
cudaMalloc(gridOnDeviceGlobal, sizeof(Grid))
cudaMemcpy(gridOnDeviceGlobal, &hostGrid, sizeof(Grid), cudaMemcpyHostToDevice)
cloudKernel<<< numBlocks, numThreads >>>(someDate, gridOnDeviceGlobal);

This is simple, but the kernels will have to read the grid-values from global memory. This might be slow.

2) As the Grid-values rarely change, I put a

__constant__ Grid myGridForPointCloudCuda

into pointcloudcuda.cu, together with two functions

void copyParametersToGpu(Grid *hostGrid)
{
    cudaMemcpyToSymbolAsync(myGridForPointCloudCuda, hostGrid, sizeof(Grid))
}

void getDevicePointerOfGridForPointCloudCuda(Grid** ptr)
{
    cudaGetSymbolAddress((void**)ptr, myGridForPointCloudCuda);
}

Now, in pointcloudcuda.cpp, I can

Grid hostGrid(...);
copyParametersToGpu(&hostGrid);
Grid* gridOnDeviceConstant;
getDevicePointerOfGridForPointCloudCuda(&gridOnDeviceConstant);
cloudKernel<<< numBlocks, numThreads >>>(someDate, gridOnDeviceConstant);

The advantage of 2), in my mind, would be the faster access to constant memory in the kernels. In other places, though, I read that this won't work, because the compiler compiling the CUDA kernels doesn't know at compile-time whether the grid-pointer being passed points to global or constant memory, and thus has to use slower memory-fetch-instructions.

Will 2) be faster than 1) on a Geforce GTX 670?

Is there a better way to do what I want? I just need to pass different Grid-instances to the kernels. And before I started using multiple Grid-instances, a constant variable was a comfortable AND fast choice.

Thanks!

1 Answer 1

1

If you have multiple Grid instances than simply allocate a Grid Array in constant memory, copy the Grid instances into the array and when calling the kernel pass an index into the Grid Array instead of a pointer to a particular Grid instance. Inside the kernel use the index to access a particular Grid instance.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.