As I understand it, when I pass a struct or class to a kernel, the copy-constructor is called on it host-side, and the copied object is then sent to the device with memcpy. Here is an example:
class Foo {
Foo(const Foo&) {std::cout << "Called before kernel execution";}
};
__global__ void kernel(Foo foo) { }
Can I somehow prevent the copy-constructor from being called, and make CUDA memcpy the object to device-memory directly? Passing foo by reference wouldn't work, since it would mix up device and host memory.
fooobject you are trying to pass to the kernel). If you have such an object set up as you wish on the host, you should be able to copy it to the device usingcudaMemcpy, without invoking any object methods or constructors. And pass-by-reference cannot be used anyway in a cuda kernel call, so perhaps you mean pass-by-pointer. I'm suggesting use pass-by-pointer, and it's unclear (to me) why that would not work.kernelby marshaling the parameters yourself throughcudaSetupArgumentandcudaLaunch. These APIs might be deprecated, however.