Prevent copy-constructor before passing object to kernel

Question

As I understand it, when I pass a struct or class to a kernel, the copy-constructor is called on it host-side, and the copied object is then sent to the device with memcpy. Here is an example:

class Foo {
    Foo(const Foo&) {std::cout << "Called before kernel execution";}
};

__global__ void kernel(Foo foo) { }

Can I somehow prevent the copy-constructor from being called, and make CUDA memcpy the object to device-memory directly? Passing foo by reference wouldn't work, since it would mix up device and host memory.

Perhaps you should give a more complete example of what you are trying to do (i.e. showing the setup of the foo object you are trying to pass to the kernel). If you have such an object set up as you wish on the host, you should be able to copy it to the device using cudaMemcpy, without invoking any object methods or constructors. And pass-by-reference cannot be used anyway in a cuda kernel call, so perhaps you mean pass-by-pointer. I'm suggesting use pass-by-pointer, and it's unclear (to me) why that would not work. — Robert Crovella
– Robert Crovella, Commented Jan 13, 2016 at 15:37
If the compiler invokes a copy constructor when passing a parameter to the triple chevrons, I doubt there's a way to prevent it. To work around it, you could launch kernel by marshaling the parameters yourself through cudaSetupArgument and cudaLaunch. These APIs might be deprecated, however. — Jared Hoberock
– Jared Hoberock, Commented Jan 13, 2016 at 20:29

icurays1 · Accepted Answer · 2016-01-13 15:08:53Z

1

Take a look at managed memory, for instance these slides from a 2013 conference. Essentially, if you write your class to extend the CUDA Managed class, and use the correct memory allocations, you can pass-by-reference and CUDA will take care of the memory mangement.

answered Jan 13, 2016 at 15:08

icurays1

3703 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

icurays1 Over a year ago

In my experience the slight performance drop is worth it, but I mostly deal with complicated objects that are only copied over to the device once during the program. If you had simple objects that were being passed around frequently, your results may vary.

Robert Crovella · Accepted Answer · 2016-01-13 20:22:28Z

1

My suggestion would be to pass foo by pointer:

$ cat t1041.cu
#include <stdio.h>

class Foo {
  public:
    __host__ __device__
    void chirp() { printf("Hello!\n"); }
    __host__ __device__
    Foo(const Foo&) {printf("Called copy constructor!\n");}
    __host__ __device__
    Foo(){};
};


__global__ void kernel(Foo *foo) {

  foo->chirp();
 }

int main(){

  Foo myfoo, *d_foo;
  cudaMalloc(&d_foo, sizeof(Foo));
  cudaMemcpy(d_foo, &myfoo, sizeof(Foo), cudaMemcpyHostToDevice);
  kernel<<<1,1>>>(d_foo);
  cudaDeviceSynchronize();
}

$ nvcc -o t1041 t1041.cu
$ ./t1041
Hello!
$

You could also use managed memory, on platforms that support it, as suggested by @icurays1.

answered Jan 13, 2016 at 20:22

Robert Crovella

154k12 gold badges255 silver badges300 bronze badges

Collectives™ on Stack Overflow

Prevent copy-constructor before passing object to kernel

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related