I'm trying to use python to control cuda, through ctypes. Here, to illustrate my problem, I use python to pass pointers along to c functions which allocate cuda memory, copy numpy array to cuda mempory, and copy cuda memory back to a new numpy array. But it doesnt seem to work, despite my basic ctypes setup working. I think the issue is with what's being returned from cudaMalloc function to python.
here's the python code
pycu_alloc = dll.alloc_gpu_mem
pycu_alloc.argtypes = [c_size_t]
pycu_alloc.restypes = [c_void_p]
host2gpu = dll.host2gpu
host2gpu.argtypes = [c_void_p, c_void_p, c_size_t]
gpu2host = dll.gpu2host
gpu2host.argtypes = [c_void_p, c_void_p, c_size_t]
a = np.random.randn(1024).astype('float32')
c = np.zeros(1024).astype('float32')
c_a = c_void_p(a.ctypes.data)
c_c = c_void_p(c.ctypes.data)
da = pycu_alloc(1024)
c_da = c_void_p(da)
host2gpu(c_a, c_da, 1024)
gpu2host(c_c, c_da, 1024)
print a
print c
and the C:
extern "C" {
float * alloc_gpu_mem( size_t N)
{
float *d;
int size = N *sizeof(float);
int err;
err = cudaMalloc(&d, size);
printf("cuda malloc: %d\n", err);
return d;
}}
extern "C" {
void host2gpu(float * a, void * da, size_t N)
{
int size = N * sizeof(float);
int err;
err = cudaMemcpy(da, a, size, cudaMemcpyHostToDevice);
printf("load mem: %d\n", err);
}}
extern "C"{
void gpu2host(float *c, void *d_c, size_t N)
{
int err;
int size = N*sizeof(float);
err = cudaMemcpy(c, d_c, size, cudaMemcpyDeviceToHost);
printf("cpy mem back %d\n", err);
}}
The code should copy a random vector a to cuda memory, and then copy that cuda memory back to an empty vector c. When I print c, thought, it is just 0s.
I've wrestled with different possibilities of the float* and void*, particularly in the way alloc_gpu_mem works. But I don't know what to do.
As for the err return values, the cudaMalloc returns 0 but both cudaMemcpy return 11.
What's python doing wrong with the pointer? Help?