11

Assuming that the architecture can support 8 byte scalars in a lock free manner for std::atomic. Why don't standard libraries provide similar specializations for structs that are under 8 bytes?

A simple implementation of such an std::atomic specialization can just serialize/deserialize (with std::memcpy) the struct into the equivalent std::uintx_t where x is the width of the struct in bits (rounded off to the closest power of 2 that is larger than or equal to the width of the struct). This would be well defined because these structs are required by std::atomic to be trivially copyable.

Eg. https://godbolt.org/z/sxSeId, here Something is only 3 bytes, but the implementation calls __atomic_load and __atomic_exchange, both of which use a lock table.

10
  • 1
    gcc gets it right if you make the struct 4 bytes (but not 3), see godbolt.org/z/d1OCmG. clang doesn't. Commented Apr 28, 2019 at 22:22
  • @PaulSanders Interesting, I wonder why 3 bytes doesn't work.. Commented Apr 28, 2019 at 22:24
  • There is no x86 instruction that loads/stores 3 bytes, let alone atomically. Commented Apr 28, 2019 at 22:44
  • @rustyx Ah, sorry, but you could always take up more than the size up to the next power of 2 though right? Section §[atomics.types.generic]p3 allows this - The representation of an atomic specialization need not have the same size as its corresponding argument type. I guess there are portability problems with that though? Commented Apr 28, 2019 at 22:58
  • 2
    @Curious: When I said "force the alignment", I meant with alignas(4). Commented Apr 28, 2019 at 23:39

1 Answer 1

6

GCC's libstdc++ atomic<T> unfortunately doesn't alignas / pad up to a power-of-2 size which would make lock-free operation possible. std::atomic<Something> arr[10] has sizeof(arr) = 30, so they're packed head-to-tail without padding. (Godbolt).

But Clang -std=libc++ does pad. sizeof(atomic<3 char struct>) == 4, and std::atomic<T>::is_always_lock_free is true.

Clang with libstdc++ (the default on GNU/Linux) works the same as GCC with libstdc++, unsurprisingly since it's the same library code.
As mentioned in comments under the question, Clang 14 and earlier with libstdc++ don't inline __atomic_load and __atomic_exchange even for a struct of four chars, unless you use alignas(4), but Clang does still treat std::atomic<4 char struct> as lock-free with alignof(T) == 4. So I think this is "just" an efficiency problem (although a serious one), not correctness. Godbolt. Clang 15 and later have the same codegen as GCC, inlining mov and xchg.)


For C _Atomic types, it's the compiler internals making the choice, rather than C++ library header code. GCC doesn't pad, Clang does to make it lock-free. So they're not ABI compatible with each other for _Atomic on small odd-sized structs! (Godbolt).

MSVC's standard library also doesn't pad odd-sized objects to make them lock-free. (The std::atomic object will contain a spinlock so will be larger and aligned by at least 4 anyway, instead of indexing a table of spinlocks like GCC and Clang do for lock-free objects.)


Use alignas() with a power-of-2 size on the first member

Use struct Something { alignas(4) char a; char b,c; };
(Not alignas(4) char a,b,c; because that would make each char padded to 4 bytes so they could each be aligned.)
This also works in C mode for _Atomic, since it makes sizeof(struct Something) == 4. (sizeof(T) is always a multiple of alignof(T) because C arrays never have padding between objects. And a struct is at least as aligned as its most-aligned member.)

Objects with a non-power-of-2 size might span a cache-line boundary so using a wider 4-byte load is not always possible.

Plus pure stores would always have to use a CAS (e.g. lock cmpxchg) to avoid inventing writes to a byte outside the object: obviously you can't use two separate mov stores (2-byte + 1-byte) because that wouldn't be atomic, unless you do that inside a TSX transaction with a retry loop.


x86 load/store are only guaranteed atomic for memory accesses that don't cross an 8-byte boundary. (On some vendors / uarches, a cache line boundary. Or for possibly-uncacheable loads/stores, basically natural alignment is what you need). Why is integer assignment on a naturally aligned variable atomic on x86?

Your struct Something { char a, b, c; }; has no alignment requirement so there's no C++ rule that prevents a Something object from spanning 2 cache lines. That would make a plain-mov load/store of it definitely non-atomic.

libstdc++ chooses to implement atomic<T> with the same layout / object-representation as T (regardless of being lock-free or not). (IDK if that was an explicit goal, or just a consequence of not checking for the possibility of aligning small objects). Therefore atomic<Something> is a 3-byte object. An array of atomic<Something> thus necessarily has some of those objects spanning cache line boundaries, and can't have padding outside the object because that's not how arrays work in C. sizeof() = 3 tells you the array layout. This makes lock-free atomic<Something> impossible. (Unless you load/store with lock cmpxchg to be atomic even on cache-line splits, which would produce a huge performance penalty in the cases where that did happen. Better to make developers fix their struct. And of course many non-x86 ISAs only support aligned atomic loads/stores/RMWs.)

The atomic<T> class can have a higher alignment requirement than T, for example atomic<int64_t> has alignof(atomic_int64_t) == 8, unlike alignof(int64_t) == 4 on many 32-bit platforms (including the i386 System V ABI).

libc++ doesn't try try to keep the layout and size of atomic<T> the same as T. They align it up to the next power-of-2 size if that's still small enough to be lock-free on the current target. This seems to me like a better implementation choice, but of course isn't ABI-compatible so libstdc++ can't change until the next ABI version.


Fun fact, gcc's C11 _Atomic support was slightly broken on 32-bit platforms with 64-bit lockfree atomics for many years after the equivalent libstdc++ bug was fixed: _Atomic int64_t could be misaligned inside structs leading to tearing. (Fixed in GCC11.1, with a 32-bit ABI change for _Atomic objects in structs: https://godbolt.org/z/1x3nEe3P3). Clang never had this problem.

But g++'s C++11 std::atomic uses a template class in a header that fixed that bug a while ago (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65147); ensuring that atomic<T> has natural alignment (up to some power of 2 size) even if T has alignment < size. Thus there's no way they can span any boundary wider than they are.

Sign up to request clarification or add additional context in comments.

1 Comment

Oh the cacheline splits actually make a ton of sense, thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.