Using python handles on cuda allocated memory with ctypes

Question

I'm trying to use python to control cuda, through ctypes. Here, to illustrate my problem, I use python to pass pointers along to c functions which allocate cuda memory, copy numpy array to cuda mempory, and copy cuda memory back to a new numpy array. But it doesnt seem to work, despite my basic ctypes setup working. I think the issue is with what's being returned from cudaMalloc function to python.

here's the python code

  pycu_alloc = dll.alloc_gpu_mem
  pycu_alloc.argtypes = [c_size_t]
  pycu_alloc.restypes = [c_void_p]   

  host2gpu = dll.host2gpu
  host2gpu.argtypes = [c_void_p, c_void_p, c_size_t]

  gpu2host = dll.gpu2host
  gpu2host.argtypes = [c_void_p, c_void_p, c_size_t]

  a = np.random.randn(1024).astype('float32')
  c = np.zeros(1024).astype('float32')

  c_a = c_void_p(a.ctypes.data)
  c_c = c_void_p(c.ctypes.data)

  da = pycu_alloc(1024)
  c_da = c_void_p(da)

  host2gpu(c_a, c_da, 1024)
  gpu2host(c_c, c_da, 1024)

  print a
  print c

and the C:

extern "C" {
float *  alloc_gpu_mem( size_t N)
{
  float *d;
  int size = N *sizeof(float);
  int err;

  err = cudaMalloc(&d, size);

  printf("cuda malloc: %d\n", err);
  return d;
 }}

 extern "C" {
 void host2gpu(float * a, void * da, size_t N)
 {
  int size = N * sizeof(float);
  int err;
  err = cudaMemcpy(da, a, size, cudaMemcpyHostToDevice);
  printf("load mem: %d\n", err);
  }}

  extern "C"{
 void gpu2host(float *c, void *d_c, size_t N)
 {
  int  err;
  int size = N*sizeof(float);
  err = cudaMemcpy(c, d_c, size, cudaMemcpyDeviceToHost);
  printf("cpy mem back %d\n", err);
 }}

The code should copy a random vector a to cuda memory, and then copy that cuda memory back to an empty vector c. When I print c, thought, it is just 0s.

I've wrestled with different possibilities of the float* and void*, particularly in the way alloc_gpu_mem works. But I don't know what to do.

As for the err return values, the cudaMalloc returns 0 but both cudaMemcpy return 11.

What's python doing wrong with the pointer? Help?

This doesn't directly answer your question, but… have you tried the existing Python CUDA bindings from Andreas Klöckner (which I think are the same ones Nvidia links to from their website, but I haven't checked)? — abarnert
– abarnert, Commented Dec 17, 2013 at 0:49
No I was looking at that and I probably should have used them but I wanted to dive in and control the cuda myself to make sure I knew what was going on. then i got caught up giving myself python handles because it would be really nice for me. but alas. — Ethan
– Ethan, Commented Dec 17, 2013 at 0:50

abarnert · Accepted Answer · 2013-12-17 01:15:50Z

5

The problem is here:

pycu_alloc.restypes = [c_void_p]

This doesn't do anything. What you wanted was:

pycu_alloc.restype = c_void_p

See Return types in the ctypes docs.

And without that, ctypes assumes that your function returns a C int. On a 32-bit platform, you might get away with it, because you end up constructing a c_void_p whose value is that int… but on a 64-bit platform, that pointer is going to end up with the upper 32 bits missing.

So, when you pass that into CUDA, it recognizes that the pointer isn't in any range it knows about, and gives you back a cudaErrorInvalidValue (11).

Also, if you get everything right, this line should be unnecessary:

c_da = c_void_p(da)

You're calling a function whose argtypes specifies c_void_p, so you can pass it an int that you got from a c_void_p-returning function just fine.

You can see the same behavior with plain old malloc and free, except that you'll probably get a segfault at free instead of a nice error:

malloc = libc.malloc
malloc.argtypes = [c_size_t]
malloc.restype = c_void_p # comment this line to crash on most 64-bit platforms

free = libc.free
free.argtypes = [c_void_p]
free.restype = None

a = malloc(1024)
free(a) # commenting this line and uncommenting the next two has no effect
#c_a = c_void_p(a)
#free(ca)

edited Dec 17, 2013 at 1:15

answered Dec 17, 2013 at 1:09

abarnert

368k54 gold badges626 silver badges691 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Eryk Sun Over a year ago

@Ethan: Remember to be careful with types that have data descriptors yet still give a __dict__ to instances. ctypes data types allow this because they're designed to be subclassed to support additional state and interfaces. Unfortunately it's easy to mistakenly create a new attribute like restypes. If you tried to assign [c_void_p] to the correct restype attribute you would at least get a TypeError. The REPL, dir and help are your friends.

Collectives™ on Stack Overflow

Using python handles on cuda allocated memory with ctypes

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related