What happens when I combine acquire with relaxed memory ordering?

Question

I have the following simple code:

size_t tail = _tail.load(std::memory_order_acquire);
size_t head = _head.load(std::memory_order_relaxed);

From what I understand, the acquire barrier acts as LoadLoad and LoadStore barrier, meaning that any loads after the barrier can't be older than loads before the barrier, and loads before the barrier must also be younger than stores after the barrier. But how does this influence the loading of _head now, since _head doesn't imply any memory ordering?

Let's say another thread writes first to tail then to head using a StoreStore fence to make sure once head is visible, tail is visible as well in global memory. Could then (in the above snippet), _head after loading still contain the previous value while _tail would already contain the current value? Or how does the acquire barrier influence the relaxed ordering imposed later?

This code doesn't compile. What are the types of _tail and _head? — Pete Becker
– Pete Becker, Commented Jun 15, 2024 at 13:36
The second thread writes to _tail then to _head. In between these two, the first thread could read both _tail and _head, thus observing new value of _tail and old value of _head. This doesn't require any fancy memory ordering considerations, it may well happen with sequential consistency. — Igor Tandetnik
– Igor Tandetnik, Commented Jun 15, 2024 at 13:41
The situation is more interesting if the other thread writes to _head and then _tail. But I don't really understand what you are asking; the "how" is vague. Can you give a more complete code example and ask a specific question about its behavior? Or are you asking about how these mechanisms are implemented by a compiler and machine architecture? — Nate Eldredge
– Nate Eldredge, Commented Jun 15, 2024 at 13:55
@PeteBecker types are std::atomic<size_t>. I think it works like this :) — glades
– glades, Commented Jun 15, 2024 at 19:33
@IgorTandetnik I guess my struggle is to understand how the acquire barrier will influence the load of the _head atomic that has a different memory ordering constraint. Does the memory barrier of one atomic have the same implications on other atomics than on normal variables? — glades
– glades, Commented Jun 15, 2024 at 19:35

user17732522 · Accepted Answer · 2024-06-15 18:08:25Z

2

If the other thread performs

_tail.store(/*...*/, std::memory_order_relaxed);
_head.store(/*...*/, std::memory_order_release);

then there isn't any acquire/release combination on the same atomic object and consequently you do not get any synchronize-with relation between the two threads at all. Synchronization of acquire/release loads/stores of atomic objects only applies between loads and stores on the same atomic object. The loads can observe the two stores in any order and inconsistently between multiple other threads.

If you mean a release fence instead, i.e.

_tail.store(/*...*/, std::memory_order_relaxed);
_head.store(/*...*/, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_release);

then you still do not get anything out of it. The release fence would synchronize-with the acquire-load of _tail, but only if it is sequenced before the corresponding relaxed atomic store in the same thread. So what you need to get any synchronization is:

std::atomic_thread_fence(std::memory_order_release);
_tail.store(/*...*/, std::memory_order_relaxed);
_head.store(/*...*/, std::memory_order_relaxed);

Now the release fence synchronizes-with the acquire-load in the first thread. However, this doesn't help you either, because the synchronize-with relation only guarantees that stores from before the fence will be visible after the acquire.

You have the operations in the second thread the wrong way around. You need to first store to _head and then to _tail with release ordering, in order to assure that _tail reading the new value also implies _head observing the new value:

_head.store(/*...*/, std::memory_order_relaxed);
_tail.store(/*...*/, std::memory_order_release);

Here the release ordering on the store can be replaced by a fence, but that is stronger than necessary:

_head.store(/*...*/, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_release);
_tail.store(/*...*/, std::memory_order_relaxed);

However, the reverse still won't hold: Loading the new value of _head now doesn't imply loading the new value of _tail. If you want both to only be observable together, then you'd have to pack them in a single atomic object.

Also beware that any synchronization will only happen if the load obtains the value from the corresponding store in the modification order of the atomic object. If you store the same value multiple times you need to differentiate between observing values from multiple stores with the same value. You'll likely run into the ABA problem.

And of course "observing" the store to _head in the above means that you may get a value from later in the modification order of _head if _head is stored to anywhere else (e.g. in a different thread or later in the same thread).

"visible as well in global memory": That's the wrong way to think about it. There is no global order in which memory stores become visible that all threads agree on.

edited Jun 15, 2024 at 18:08

answered Jun 15, 2024 at 17:42

user17732522

78.1k3 gold badges82 silver badges147 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

glades Over a year ago

Thank you that is very helpful. I guess I just don't have a fully fledged mental model to wrap my head around this topic. Do you have any further references that could help me learn more in this direction?

user17732522 Over a year ago

@glades It might just be my mathematics background, but I personally like to just work through the specification (i.e. en.cppreference.com/w/cpp/atomic/memory_order or eel.is/c++draft/intro.multithread) with some toy examples to get a mental model for the specification. However, the C++ memory model is weaker than actual hardware memory models and therefore it might be hard to translate if you think in terms of reordering. That perspective may be sufficient though, at least on actual hardware. Others on this site know much more about that than I do.

user17732522 Over a year ago

Forgot link above: eel.is/c++draft/atomics

Peter Cordes Over a year ago

@glades: For the basic concepts of acquire and release that the C++ formalism is exposing, preshing.com/20120913/acquire-and-release-semantics is very good. The C++ standardese is very hard to follow if you don't know the kind of thing it's trying to formalize.

user17732522 Over a year ago

@glades For practical purposes the acquire-load on _tail acts as a LoadLoad barrier and the compiler/CPU won't be able to move the load of _head before it, no matter whether it is relaxed, acquire or even non-atomic. That's necessary because the compiler/CPU can't know whether there will be another thread that performs a release-store on _tail after writing to _head, in which case they still need to guarantee that _heads previous value won't be seen when _tails new one is seen. Making _head's load an acquire-load won't have any meaningful effect in your example.

|

Collectives™ on Stack Overflow

What happens when I combine acquire with relaxed memory ordering?

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related