2

As I understand it, when I pass a struct or class to a kernel, the copy-constructor is called on it host-side, and the copied object is then sent to the device with memcpy. Here is an example:

class Foo {
    Foo(const Foo&) {std::cout << "Called before kernel execution";}
};

__global__ void kernel(Foo foo) { }

Can I somehow prevent the copy-constructor from being called, and make CUDA memcpy the object to device-memory directly? Passing foo by reference wouldn't work, since it would mix up device and host memory.

2
  • 1
    Perhaps you should give a more complete example of what you are trying to do (i.e. showing the setup of the foo object you are trying to pass to the kernel). If you have such an object set up as you wish on the host, you should be able to copy it to the device using cudaMemcpy, without invoking any object methods or constructors. And pass-by-reference cannot be used anyway in a cuda kernel call, so perhaps you mean pass-by-pointer. I'm suggesting use pass-by-pointer, and it's unclear (to me) why that would not work. Commented Jan 13, 2016 at 15:37
  • If the compiler invokes a copy constructor when passing a parameter to the triple chevrons, I doubt there's a way to prevent it. To work around it, you could launch kernel by marshaling the parameters yourself through cudaSetupArgument and cudaLaunch. These APIs might be deprecated, however. Commented Jan 13, 2016 at 20:29

2 Answers 2

1

Take a look at managed memory, for instance these slides from a 2013 conference. Essentially, if you write your class to extend the CUDA Managed class, and use the correct memory allocations, you can pass-by-reference and CUDA will take care of the memory mangement.

Sign up to request clarification or add additional context in comments.

1 Comment

In my experience the slight performance drop is worth it, but I mostly deal with complicated objects that are only copied over to the device once during the program. If you had simple objects that were being passed around frequently, your results may vary.
1

My suggestion would be to pass foo by pointer:

$ cat t1041.cu
#include <stdio.h>

class Foo {
  public:
    __host__ __device__
    void chirp() { printf("Hello!\n"); }
    __host__ __device__
    Foo(const Foo&) {printf("Called copy constructor!\n");}
    __host__ __device__
    Foo(){};
};


__global__ void kernel(Foo *foo) {

  foo->chirp();
 }

int main(){

  Foo myfoo, *d_foo;
  cudaMalloc(&d_foo, sizeof(Foo));
  cudaMemcpy(d_foo, &myfoo, sizeof(Foo), cudaMemcpyHostToDevice);
  kernel<<<1,1>>>(d_foo);
  cudaDeviceSynchronize();
}

$ nvcc -o t1041 t1041.cu
$ ./t1041
Hello!
$

You could also use managed memory, on platforms that support it, as suggested by @icurays1.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.