If the other thread performs
_tail.store(/*...*/, std::memory_order_relaxed);
_head.store(/*...*/, std::memory_order_release);
then there isn't any acquire/release combination on the same atomic object and consequently you do not get any synchronize-with relation between the two threads at all. Synchronization of acquire/release loads/stores of atomic objects only applies between loads and stores on the same atomic object. The loads can observe the two stores in any order and inconsistently between multiple other threads.
If you mean a release fence instead, i.e.
_tail.store(/*...*/, std::memory_order_relaxed);
_head.store(/*...*/, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_release);
then you still do not get anything out of it. The release fence would synchronize-with the acquire-load of _tail, but only if it is sequenced before the corresponding relaxed atomic store in the same thread. So what you need to get any synchronization is:
std::atomic_thread_fence(std::memory_order_release);
_tail.store(/*...*/, std::memory_order_relaxed);
_head.store(/*...*/, std::memory_order_relaxed);
Now the release fence synchronizes-with the acquire-load in the first thread. However, this doesn't help you either, because the synchronize-with relation only guarantees that stores from before the fence will be visible after the acquire.
You have the operations in the second thread the wrong way around. You need to first store to _head and then to _tail with release ordering, in order to assure that _tail reading the new value also implies _head observing the new value:
_head.store(/*...*/, std::memory_order_relaxed);
_tail.store(/*...*/, std::memory_order_release);
Here the release ordering on the store can be replaced by a fence, but that is stronger than necessary:
_head.store(/*...*/, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_release);
_tail.store(/*...*/, std::memory_order_relaxed);
However, the reverse still won't hold: Loading the new value of _head now doesn't imply loading the new value of _tail. If you want both to only be observable together, then you'd have to pack them in a single atomic object.
Also beware that any synchronization will only happen if the load obtains the value from the corresponding store in the modification order of the atomic object. If you store the same value multiple times you need to differentiate between observing values from multiple stores with the same value. You'll likely run into the ABA problem.
And of course "observing" the store to _head in the above means that you may get a value from later in the modification order of _head if _head is stored to anywhere else (e.g. in a different thread or later in the same thread).
"visible as well in global memory": That's the wrong way to think about it. There is no global order in which memory stores become visible that all threads agree on.
_tailand_head?_tailthen to_head. In between these two, the first thread could read both_tailand_head, thus observing new value of_tailand old value of_head. This doesn't require any fancy memory ordering considerations, it may well happen with sequential consistency._headand then_tail. But I don't really understand what you are asking; the "how" is vague. Can you give a more complete code example and ask a specific question about its behavior? Or are you asking about how these mechanisms are implemented by a compiler and machine architecture?std::atomic<size_t>. I think it works like this :)_headatomic that has a different memory ordering constraint. Does the memory barrier of one atomic have the same implications on other atomics than on normal variables?