CAS and LL/SC are both general building blocks; any atomic RMW operation can be synthesized from either them. You need one or the other for lock-free std::atomic<int>. Some ISAs have both (notably ARMv8.1), most have one or the other; most older RISCs like PowerPC having just LL/SC. x86-64 has only CAS. Direct support for specific operations like .fetch_add can avoid retry loops for those operations, but that's not really qualitatively different. See
std::atomic_flag is required to be lock-free, and can be used to implement locking, which allows non-lock-free std::atomic<T>. So the true minimum required to implement C++11 at all is just the primitive things early ISAs had which were sufficient for locking but aren't building blocks for arbitrary lock-free atomic RMW operations the way LL/SC and CAS are.
For it, you only need a minimum of TAS (test-and-set) or something which can be used as a TAS, like exchange. And atomic byte or word stores for .clear(), and control of memory ordering via ISA guarantees, fence instructions, or ordering semantics baked into atomic RMW / load / store instructions. (And in C++20, atomic byte or word loads for .test(). Before that, ISAs didn't need to support seq_cst pure loads.)
See Why does C/C++ not have a atomic flag test_and_clear? for more discussion of what some old ISAs have and how they work as building blocks, like m68k actually having an instruction called tas. (cas only in 68020 and later.) And several ISAs having an atomic RMW swap/exchange instruction.
For ordering support, it's always fine to be stronger than the standard requires, even if that means promoting all acquire and release ops to seq_cst. But it's not fine if an ISA is unable to recover sequential consistency. CPU architects know this, and provide fence instructions if their ISA allows runtime memory reordering in the first place.
(Unless you care a lot about retro hardware, just use std::atomic<bool> like a normal person.)
<thread>
No special instruction, but a timer interrupt of some sort and an ISA with interrupt handling is pretty necessary for pre-emptive multi-tasking. Without that, a compiler would have to insert yield() calls at a lot of points to make loops non-infinite and ensure that std::atomic loads see stores in finite time, among other things.
<mutex>
Mutual exclusion without an atomic RMW is possible with just sequentially consistent load/store (Dekker's or Peterson's algorithms, or a few others), but those require an array of flags or counters with an entry for each thread. Not really compatible with being able to start new std::threads at any time according to program logic, after std::mutex objects already exist and are already in use.
std::atomic_flag is sufficient for mutual exclusion with a simple spinlock. I'm not sure if that's sufficient for a proper implementation of something like Linux futex OS-assisted sleep-while-value-unchanged like C++20 std::atomic .wait() and .notify_one() / .notify_all(). (The OS needs to atomically check-and-sleep to avoid putting a core to sleep or context-switching after a notify has already arrived or after a value-change has become visible. Or at least not to lose notifications.)
std::atomic_flagis required to be lock-free, and it intentionally only offers limited operations: test_and_set being the important one (atomic RMW). Any CPU with a TAS or SWAP can implement an atomic flag even if they don't have general building blocks like CAS or LL/SC, assuming byte or word load/store are also atomic for the pure-store (and the C++20-added pure-load). Is that what you wanted to know? Or you wanted to know about realistic systems that support lock-freestd::atomic<int>?amocasas a recent extension. But without that, GCC and Clang do indeed compile CAS into LL/SC vs. atomic add intoamoadd, so they don't consider it baseline by default, unlikeamoadd; you need-march=rv64g_zacasto use it. godbolt.org/z/MsqeY78vW But yes, either LL/SC or CAS are sufficient building blocks for all ofstd::atomic<int>; before ARMv8.1, it only had LL/SC, no single-instruction atomics. (But ARMv8.1 introduced a CAS along with atomic add and so on; strange that RISC-V left that out of their initial AMO insns.)std::atomic<T>up to whatever width they support. See en.wikipedia.org/wiki/Load-link/… . ARMv6 and ARMv7, and ARMv8.0 only had LL/SC (swpwas deprecated on ARM at some point so I'm not counting that, and I don't know how early LL/SC was.) x86-64 only has CAS and some single-instruction other operations but not LL/SC. PowerPC only has LL/SC, like most RISCs.