Is there any corresponding method for 3D array allocation and copy like cudaMallocPitch() and cudaMemcpy2D()?
Thanks!
You can also use cudaMallocPitch() to allocate a linear device memory for 3D data by defining height as height * depth and get coalesced access in your kernel as long as you are taking into account the returned pitch from cudaMallocPitch() and some index arithmetic.
Similarly you can use cudaMalloc3D() to allocate logical 1D, 2D, or 3D memory objects on the device and get a pitchedDevPtr to work with in your kernel.