I'd like to share some constants between CPU and GPU in order to allow for execution of the same code (wrapped in between) on either. That includes some compile-time parameters which are most reasonably written as arrays and I'd like them to be constexpr such that the compiler can (ideally) elide the arrays during compilation.
When I try
#include <stdio.h>
using fe_ftype = double;
__device__ constexpr fe_ftype vars[2] = {100.0, 300.0};
//__constant__ constexpr fe_ftype vars[2] = {100.0, 300.0};
const fe_ftype l = 3.0;
__global__ void foo() {
printf("%lf %lf %lf\n", vars[0], vars[1], l);
}
int main(void) {
foo<<<1,1>>>();
cudaDeviceSynchronize();
printf("%lf %lf %lf\n", vars[0], vars[1], l);
}
(see https://godbolt.org/z/19bYj34f8)
It seems to work in that compilation succeeds and I get the same output from both the GPU and CPU side, with the constants either as constant read-only data for CPU or being moved into the registers directly on the GPU side. However, the compiler complains with warning #20091-D: a __device__ variable "vars" cannot be directly read in a host function. If it were not a constant, I would be accessing some random uninitialized memory on the host, but with the constexpr annotation it seems to successfully grok it on the host side. It also seems to work identically if I use __constant__ instead of __device__.
Can I rely on this behaviour? Does the warning generally not apply to constexpr variables? Extra condition: I can only use C++17, so no consteval.
__constant__for higher dimension array.constexprw/o__device__might be the best solution as discussed below these questions. But special use cases could favor different combinations.constexprw/o__device__requires using aconstevalwrapper which requires C++20 but I'm stuck on C++17. With C++17 I get a compile time error since the variable isn't visible on the device. The solution with cuda::std::array looks nice, though I'll have to wrap it for non-CUDA compilation.constevalin that context b/c it made sense but it should work withconstexprinstead, assuming the argument isconstexpr. That is why thecuda::std::arrayworks. It'soperator[]isconstexpr. 2.cuda::std::arrayis available for host code as well. You might just have to tell your host compiler where to find CCCL.