Indeed, as @Johan says in a comment, this is a bug.
The bug has two aspects, actually:
A __device__-side global variable exists on every device on which the code with that variable is used (or more exactly, in every context into which the variable's module is loaded; see §18.2 of the Programming Guide). In other words - when a device runs your code, a is well-defined; but on the host side - there may be many a's, so - which a's address could you take, at all?
Even supposing there were just one device (and one context), and the choice of a on the host was somehow made reasonably (e.g. the primary context on the current device) - you should still not be getting different addresses: CUDA, since version 6.0, uses a single unified memory space for all addresses on GPUs and on the host. That means that the address of an entity (like a variable or function) is the same for device-side code and host-side code - regardless of whether it's visible from both host and device.
I tried try your code with a non-constexpr variable, and of a simpler type (an int x); but I changed it to print the value on all of my (two) CUDA-capable GPUs. The resulting output is:
device 0: int x at address 0x7f826ba00000 has value 123
device 1: int x at address 0x7f826b000000 has value 123
host: int x at address 0x560032e54568 has value 0
host: int y at address 0x560032e544b0 has value 456
so, two variables, on two devices; and yet we get a third address; and it looks very much like the address of another global host-side variable that I also defined.
Here's the program I used:
#include <cstdio>
__device__ int x = 123;
int y = 456;
__host__ __device__ void display_x(int const* dev_idx, int const& x) {
if (dev_idx) { std::printf("device %d: ", *dev_idx); }
else { std::printf("host: "); }
std::printf("int x at address %p has value %d\n", &x, x);
}
// Type your code here, or load an example.
__global__ void display_x_kernel(int dev_idx) {
display_x(&dev_idx, x);
}
int main() {
int num_devices = 0;
cudaGetDeviceCount(&num_devices);
for(int dev_idx = 0; dev_idx < num_devices; dev_idx++) {
cudaSetDevice(dev_idx);
display_x_kernel<<<1, 1>>>(dev_idx);
cudaDeviceSynchronize();
}
display_x(nullptr, x);
std::printf("host: int y at address %p has value %d\n", &y, y);
}
I'll also take this opportunity to suggest that you avoid __device__ global variables altogether - except for rare case I can't think of right now off the top of my head. (Non-constant) globals are to be avoided generally, and device globals are kind of the lame brothers of host-side-code globals - considering that your program will have to copy the values of these globals, at run-time, to the GPU, anyway. And if you want constexpr values which "disappear", due to only being used at compile-time - well, I would suggest making them local to a consteval function. Maybe it won't make a difference, but frankly - I don't want to have to prove to myself that NVCC or NVRTC or clang will actually get rid of them.
Thanks goes to @paleonix for setting me straight regarding the most salient aspect of this bug.
__device__. When I reverse the scenario I get a compilation failure, which certainly gives me pause. (If we change theiparameter from pass-by-reference to pass-by-value, we can avoid that compilation failure - again - this is documented.)iinmy_print(which you can't because it'sconstexprandconst), there's no need formy_printto haveconst int& i.int iwill be faster because it's a simple scalar (and work, apparently).__device__variables leaking into the__host__code looks like a bug to me, the host should not see device code and any use of device methods/vars should be a compilation error. Who would you make it undefined behavior? Is the C++ community not trying to get rid of undefined behavior?constexprand other ways to define compile time constants is thatconstexpr"variables" do have an address and that creating a reference is taking that address (which can point to memory that is not accessible from either the host or the device). Ideally the compiler will inline the function and directly use the value of the constexpr variable. But if it can't do that, something will go wrong.