GCC's libstdc++ atomic<T> unfortunately doesn't alignas / pad up to a power-of-2 size which would make lock-free operation possible. std::atomic<Something> arr[10] has sizeof(arr) = 30, so they're packed head-to-tail without padding. (Godbolt).
But Clang -std=libc++ does pad. sizeof(atomic<3 char struct>) == 4, and std::atomic<T>::is_always_lock_free is true.
Clang with libstdc++ (the default on GNU/Linux) works the same as GCC with libstdc++, unsurprisingly since it's the same library code.
As mentioned in comments under the question, Clang 14 and earlier with libstdc++ don't inline __atomic_load and __atomic_exchange even for a struct of four chars, unless you use alignas(4), but Clang does still treat std::atomic<4 char struct> as lock-free with alignof(T) == 4. So I think this is "just" an efficiency problem (although a serious one), not correctness. Godbolt. Clang 15 and later have the same codegen as GCC, inlining mov and xchg.)
For C _Atomic types, it's the compiler internals making the choice, rather than C++ library header code. GCC doesn't pad, Clang does to make it lock-free. So they're not ABI compatible with each other for _Atomic on small odd-sized structs! (Godbolt).
MSVC's standard library also doesn't pad odd-sized objects to make them lock-free. (The std::atomic object will contain a spinlock so will be larger and aligned by at least 4 anyway, instead of indexing a table of spinlocks like GCC and Clang do for lock-free objects.)
Use alignas() with a power-of-2 size on the first member
Use struct Something { alignas(4) char a; char b,c; };
(Not alignas(4) char a,b,c; because that would make each char padded to 4 bytes so they could each be aligned.)
This also works in C mode for _Atomic, since it makes sizeof(struct Something) == 4. (sizeof(T) is always a multiple of alignof(T) because C arrays never have padding between objects. And a struct is at least as aligned as its most-aligned member.)
Objects with a non-power-of-2 size might span a cache-line boundary so using a wider 4-byte load is not always possible.
Plus pure stores would always have to use a CAS (e.g. lock cmpxchg) to avoid inventing writes to a byte outside the object: obviously you can't use two separate mov stores (2-byte + 1-byte) because that wouldn't be atomic, unless you do that inside a TSX transaction with a retry loop.
x86 load/store are only guaranteed atomic for memory accesses that don't cross an 8-byte boundary. (On some vendors / uarches, a cache line boundary. Or for possibly-uncacheable loads/stores, basically natural alignment is what you need). Why is integer assignment on a naturally aligned variable atomic on x86?
Your struct Something { char a, b, c; }; has no alignment requirement so there's no C++ rule that prevents a Something object from spanning 2 cache lines. That would make a plain-mov load/store of it definitely non-atomic.
libstdc++ chooses to implement atomic<T> with the same layout / object-representation as T (regardless of being lock-free or not). (IDK if that was an explicit goal, or just a consequence of not checking for the possibility of aligning small objects).
Therefore atomic<Something> is a 3-byte object. An array of atomic<Something> thus necessarily has some of those objects spanning cache line boundaries, and can't have padding outside the object because that's not how arrays work in C. sizeof() = 3 tells you the array layout. This makes lock-free atomic<Something> impossible. (Unless you load/store with lock cmpxchg to be atomic even on cache-line splits, which would produce a huge performance penalty in the cases where that did happen. Better to make developers fix their struct. And of course many non-x86 ISAs only support aligned atomic loads/stores/RMWs.)
The atomic<T> class can have a higher alignment requirement than T, for example atomic<int64_t> has alignof(atomic_int64_t) == 8, unlike alignof(int64_t) == 4 on many 32-bit platforms (including the i386 System V ABI).
libc++ doesn't try try to keep the layout and size of atomic<T> the same as T. They align it up to the next power-of-2 size if that's still small enough to be lock-free on the current target. This seems to me like a better implementation choice, but of course isn't ABI-compatible so libstdc++ can't change until the next ABI version.
Fun fact, gcc's C11 _Atomic support was slightly broken on 32-bit platforms with 64-bit lockfree atomics for many years after the equivalent libstdc++ bug was fixed: _Atomic int64_t could be misaligned inside structs leading to tearing. (Fixed in GCC11.1, with a 32-bit ABI change for _Atomic objects in structs: https://godbolt.org/z/1x3nEe3P3). Clang never had this problem.
But g++'s C++11 std::atomic uses a template class in a header that fixed that bug a while ago (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65147); ensuring that atomic<T> has natural alignment (up to some power of 2 size) even if T has alignment < size. Thus there's no way they can span any boundary wider than they are.
§[atomics.types.generic]p3allows this - The representation of an atomic specialization need not have the same size as its corresponding argument type. I guess there are portability problems with that though?alignas(4).