Understanding `memory_order_acquire` and `memory_order_release` in C++11

Question

I'm reading through the documentation and more specifically

memory_order_acquire: A load operation with this memory order performs the acquire operation on the affected memory location: no reads or writes in the current thread can be reordered before this load. All writes in other threads that release the same atomic variable are visible in the current thread (see Release-Acquire ordering below).

memory_order_release: A store operation with this memory order performs the release operation: no reads or writes in the current thread can be reordered after this store. All writes in the current thread are visible in other threads that acquire the same atomic variable (see Release-Acquire ordering below) and writes that carry a dependency into the atomic variable become visible in other threads that consume the same atomic (see Release-Consume ordering below)

These two bits:

from memory_order_acquire

... no reads or writes in the current thread can be re-ordered before this load...

from memory_order_release

... no reads or writes in the current thread can be re-ordererd after this store...

What exactly do they mean?

There's also this example

#include <thread>
#include <atomic>
#include <cassert>
#include <string>

std::atomic<std::string*> ptr;
int data;

void producer()
{
    std::string* p  = new std::string("Hello");
    data = 42;
    ptr.store(p, std::memory_order_release);
}

void consumer()
{
    std::string* p2;
    while (!(p2 = ptr.load(std::memory_order_acquire)))
        ;
    assert(*p2 == "Hello"); // never fires
    assert(data == 42); // never fires
}

int main()
{
    std::thread t1(producer);
    std::thread t2(consumer);
    t1.join(); t2.join();
}

But I cannot really figure where the two bits I've quoted apply. I understand what's happening but I don't really see the re-ordering bit because the code is small.

Keep in mind that cppreference is unofficial documentation, meant to be a more readable summary of the C++ standard written collectively by Wiki contributors. It's usually pretty good, but is in no way authoritative. One thing in particular is that they describe the memory model in terms of reordering: this is completely different from the standard itself which uses an abstract happens-before description, and sometimes subtle details are lost or left unclear in the translation. — Nate Eldredge
– Nate Eldredge, Commented Jan 15, 2024 at 18:45
@NateEldredge though the C++ standard does not explicitly talk about instruction reordering in the case of various memory orderings, I believe there must be some guarantees. Take, for example, implementing a lock; it'll use acquire-load semantics for a successful lock and release-store semantics for the unlock. Now, there must be a guarantee that instructions after the acquire-load, in program order, are not reordered before it; otherwise, we would have a situation where the code's critical section is reordered before the acquire-load of the lock, which can cause a data race. — TheGhostJoker
– TheGhostJoker, Commented Mar 3, 2024 at 14:33
@TheGhostJoker: The Standard only defines observable behavior, which at the end of the day, is entirely determined by the values returned by the program's loads, and memory ordering tells you which of the values stored elsewhere in the program may or may not be returned by any given load. Of course, in real life, typical machines can only accomplish that by inhibiting reordering in the way you describe. But in theory, if there were a way for a given machine to reorder its instructions while still preserving the desired semantics, it would be free to do so. — Nate Eldredge
– Nate Eldredge, Commented Mar 4, 2024 at 2:34
@TheGhostJoker: To put it another way, we can say in effect that the implementation must behave as if memory accesses that follow an acquire load in program order are not reordered before it. But what's actually going on under the hood is anybody's guess. — Nate Eldredge
– Nate Eldredge, Commented Mar 4, 2024 at 2:36
@TheGhostJoker: But as one slightly less outlandish possibility, in your example of an acquire load, the Standard only promises synchronization when the acquire load observes the value of a release store. If the compiler could somehow prove that the program never makes a release store to the object in question, then it could disregard the barrier effect of the acquire load, and freely reorder around it. — Nate Eldredge
– Nate Eldredge, Commented Mar 4, 2024 at 2:40

rustyx · Accepted Answer · 2022-01-18 15:20:29Z

14

The work done by a thread is not guaranteed to be visible to other threads.

To make data visible between threads, a synchronization mechanism is needed. A non-relaxed atomic or a mutex can be used for that. It's called the acquire-release semantics. Writing a mutex "releases" all memory writes before it and reading the same mutex "acquires" those writes.

Here we use ptr to "release" work done so far (data = 42) to another thread:

    data = 42;
    ptr.store(p, std::memory_order_release); // changes ptr from null to not-null

And here we wait for that, and by doing that we synchronize ("acquire") the work done by the producer thread:

    while (!ptr.load(std::memory_order_acquire)) // assuming initially ptr is null
        ;
    assert(data == 42);

Note two distinct actions:

we wait between threads (the synchronization step)
as a side effect of the wait, we get to transfer work from the provider to the consumer (the provider releases and the consumer acquires it)

In the absence of (2), e.g. when using memory_order_relaxed, only the atomic value itself is synchronized. All other work done before/after isn't, e.g. data won't necessarily contain 42 and there may not be a fully constructed string instance at the address p (as seen by the consumer).

For more details about acquire/release semantics and other details of the C++ memory model I would recommend watching Herb's excellent atomic<> weapons talk, it's long but is fun to watch. And for even more details there's a book called "C++ Concurrency in Action".

edited Jan 18, 2022 at 15:20

answered Jan 7, 2020 at 11:55

rustyx

86.4k28 gold badges224 silver badges298 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user8469759 Over a year ago

With reference to the documentation what does "re-ordering before/after" mean? (namely the two specific quotes in my question).

UKMonkey Over a year ago

ok, that makes a lot of sense. but raises the question what does ptr.store(p, std::memory_order_aquire) mean? If it's completely meaningless, then why is it an option?

rustyx Over a year ago

@UKMonkey which order values can be specified for a store are specified in [atomics.order]. memory_order_acquire isn't one of them, because it makes no sense. So strictly speaking the code ptr.store(p, std::memory_order_acquire) would invoke undefined behavior.

UKMonkey Over a year ago

@rustyx thanks :) I think I understand it all a bit better now!

Persixty · Accepted Answer · 2025-10-13 15:19:57Z

8

Acquire and Release are Memory Barriers. If your program reads data after an acquire barrier you are assured you will be reading data consistent in order with any preceding release by any other thread in respect of the same atomic variable. Atomic variables are guaranteed to have an absolute order (when using memory_order_acquire and memory_order_release though weaker operations are provided for) to their reads and writes across all threads. These barriers in effect propagate that order to any threads that are using that atomic variable. You can use atomics to indicate something has 'finished' or is 'ready' but if the consumer reads other data than the atomic variable the consumer can't rely on 'seeing' the right 'versions' of other memory then atomics would have limited value.

The statements about 'moving before' or 'moving after' are instructions to the optimizer that it shouldn't re-order operations to take place out of order. Optimizers are very good at re-ordering instructions and even omitting redundant reads/writes but if they re-organise the code across the memory barriers they may unwittingly violate that order.

Your code relies on (a) the std::string object having been constructed in producer() before ptr is assigned and (b) the constructed version of that string (i.e. the version of the memory it occupies) being the one that consumer() reads. Put simply consumer() is going to eagerly read the string as soon as it sees ptr assigned so it better see a valid and fully constructed object or bad times will ensue. In that code 'the act' of assigning ptr is how producer() 'tells' consumer the string is 'ready'. The memory barrier exists to make sure that's what the consumer sees.

Conversely if ptr was declared as an ordinary std::string * then the compiler could decide to optimize p away and assign the allocated address directly to ptr and only then construct the object and assign the int data. That is likely a disaster for the consumer thread which is using that assignment as the indicator that the objects producer is preparing are ready. To be accurate if ptr were a pointer the consumer may never see the value assigned or on some architectures read a partially assigned value where only some of the bytes have been assigned and it points to a garbage memory location. However those aspects are about it being atomic not the wider memory barriers.

Footnote: The code provides a good demonstration of memory barriers and the asserts() that never fire illustrate the memory order guarantee. However it's worth noting the design is not recommended for a scalable system. That is because the consumer thread performs 'busy waiting'. The thread loops until the string object is assigned to ptr. The thread will be competing for compute cycles with other threads.
A more scalable design would use a std::condition_variable which provides as a 'dormant waiting'. Of course the code provided is an illustrative example and the time spent busy waiting can be expected to be very brief. But in general busy waiting by consumers is not a recommended implementation of the producer/consumer pattern.

edited Oct 13 at 15:19

answered Jan 7, 2020 at 13:13

Persixty

8,6193 gold badges16 silver badges38 bronze badges

8 Comments

user8469759 Over a year ago

I think you're answer is the most complete one. Can you have a look at this question as well? stackoverflow.com/questions/59651328/…

Eric Over a year ago

"Atomic variables are guaranteed to have an absolute order to their reads and writes across all threads." Is this really true? I think that guarantee is provided by SC or stronger, not atomics in general.

Persixty Over a year ago

@eric Fair comment. memory_order_acquire and memory_order_release' ensure sequential consistency of atomics but weaker operations such as memory_order_relaxed` are provided for. I've amended the answer accordingly. This is a very subtle area and I appreciate any corrections that improve clarity.

Eric Over a year ago

@Persixty, how does memory_order_acquire and memory_order_release ensure sequential consistency?

Persixty Over a year ago

@Eric Because they imply memory barriers so other threads that access the same atomic variable will experience sequential consistency in the sense that in the above example the std::string object will appear to have happened (in full) before the atomic pointer was set.

|

Caleth · Accepted Answer · 2020-01-07 10:40:38Z

3

If you used std::memory_order_relaxed for the store, the compiler could use the "as-if" rule to move data = 42; to after the store, and consumer could see a non-null pointer and indeterminate data.

If you used std::memory_order_relaxed for the load, the compiler could use the "as-if" rule to move the assert(data == 42); to before the load loop.

Both of these are allowed because the value of data is not related to the value of ptr

If instead ptr were non-atomic, you'd have a data race and therefore undefined behaviour.

answered Jan 7, 2020 at 10:40

Caleth

66k2 gold badges53 silver badges101 bronze badges

16 Comments

user8469759 Over a year ago

I'm not sure I understand. As far as I understand the memory_order_relaxed would imply re-ordering of instructions, but it would preserve atomicity and order consistency. My interpretation of the example in my question is that no instruction before/after the store/load would be re-ordered, what I struggle to understand is how this differs from the order consistency guaranteed by the relaxed model. Is there a better example (code) that can show this?

Caleth Over a year ago

@user8469759 If you do a bunch of memory_order_relaxed stores to a single variable on one thread, other threads can't see those out of order (e.g. incrementing a counter remains monotonic), but there are no constraints on operations affecting other variables.

Caleth Over a year ago

@user8469759 I think that the assert(*p2 == "Hello"); in the example is a red herring. The only way that can appear to fire is in the non-atomic, undefined behaviour case.

Caleth Over a year ago

@user8469759 that's what you are misunderstanding. "modification order consistency" is only referring to modifications to that variable

mvd Over a year ago

You might want to have a look at this talk: it's long, but he really gives good explanations of all the details of atomics and the memory model: channel9.msdn.com/Shows/Going+Deep/…

|

Collectives™ on Stack Overflow

Understanding `memory_order_acquire` and `memory_order_release` in C++11

3 Answers 3

4 Comments

8 Comments

16 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

8 Comments

16 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related