Consider the following code, which uses a std::atomic to atomically load a 64-bit object.
#include <atomic>
struct A {
int32_t x, y;
};
A f(std::atomic<A>& a) {
return a.load(std::memory_order_relaxed);
}
With GCC, good things happen, and the following code is generated. (https://godbolt.org/z/zS53ZF)
f(std::atomic<A>&):
mov rax, QWORD PTR [rdi]
ret
This is exactly what I'd expect, since I see no reason why a 64-bit struct shouldn't be able to be treated like any other 64-bit word in this situation.
With Clang, however, the story is different. Clang generates the following. (https://godbolt.org/z/d6uqrP)
f(std::atomic<A>&): # @f(std::atomic<A>&)
push rax
mov rsi, rdi
mov rdx, rsp
mov edi, 8
xor ecx, ecx
call __atomic_load
mov rax, qword ptr [rsp]
pop rcx
ret
mov rdi, rax
call __clang_call_terminate
__clang_call_terminate: # @__clang_call_terminate
push rax
call __cxa_begin_catch
call std::terminate()
This is problematic for me for several reasons:
- More obviously, there are far more instructions, so I'd expect the code to be less efficient
- Less obviously, notice that the generated code also includes a call to a library function
__atomic_load, which means that my binary needs to be linked with libatomic. This means I need different lists of libraries to link depending on whether user's of my code use GCC or Clang. The library function might use a lock, which would be a performance decrease
The important question on my mind right now is whether there is a way to get Clang to also convert the load into a single instruction. We are using this as part of a library that we plan to distribute to others, so we cannot rely on a particular compiler being used. The solution suggested to me so far is to use type punning and store the struct inside a union alongside a 64-bit int, since Clang does correctly load 64-bit ints atomically in one instruction. I am skeptical of this solution, however, since although it appears to work on all major compilers, I have read that it is in fact undefined behaviour. Such code is also not particularly friendly for others to read and understand if they are not familiar with the trick.
To summarize, is there a way to atomically load a 64-bit struct that:
- Works in both Clang and GCC, and preferably most other popular compilers,
- Generates a single instruction when compiled,
- Is not undefined behaviour,
- Is reader friendly?