2

I’ve spent several hours studying memory orderings, but I still have some contradictions in my head. One of them concerns the Acquire/Release memory orders.

Currently, my understanding is:

No operation after an Acquire can be reordered before it, and no operation before a Release can be reordered after it.

Using Acquire/Release on the same memory location ensures that operations after the Acquire see all side effects that happened before the Release.

I’ve also seen people say that Acquire/Release only works in pairs. That’s where my mental model breaks down a bit.

Let’s take an example: I’m implementing an SPSC (single-producer single-consumer) ring buffer queue. I have two methods: maybe_push and consumer_pop_many.

bool consumer_pop_many(T* dst, size_t& count) {
    Index head_val = head.unsync_load();   // only consumer can change head (unsync_load is a custom function)
    Index tail_val = tail.load(std::memory_order_acquire);   // (1)

    size_t available = len(head_val, tail_val);
    size_t n = std::min(count, available);

    if (n == 0) {
        count = 0;
        return false;
    }

    size_t head_idx = static_cast<size_t>(head_val % CAPACITY);
    size_t right = CAPACITY - head_idx;

    // copy data code

    head.store(head_val + n, std::memory_order_release);     // (2)

    count = n;
    return true;
}

bool producer_maybe_push(T&& value) {
    Index tail_val = tail.unsync_load();   // only the producer can change tail (unsync_load is a custom function)
    Index head_val = head.load(std::memory_order_relaxed);   // (3)

    if (len(head_val, tail_val) == CAPACITY) {
        return false;
    }

    size_t idx = static_cast<size_t>(tail_val % CAPACITY);
    new (&buffer[idx]) T(std::move(value));

    tail.store(tail_val + 1, std::memory_order_release);     // (4)
    return true;
}

My reasoning:

  • At (1) I use Acquire so that I can see the write from (4) (the side effect). All other operations in the function already depend on this load anyway, but Acquire guarantees visibility of that write.

  • At (2) I use Release to make sure the data copy happens strictly before the head moves forward.

  • At (4) I use Release to ensure the write of the value happens strictly before the tail update, so that (1) observes it correctly.

But can I use Relaxed in (3)?

I don’t care about instruction reordering in that method, since I could load the tail later and everything else depends on the result of that load anyway. But what about side effects?

A read itself isn’t a side effect, but it is ordered against the Release in consumer_pop_many. Could it happen that:

  • Thread 1 (consumer) reads and advances the head,

  • Thread 2 (producer) then reads the head, but does so “before” the consumer’s read (in terms of visibility)?

This looks impossible, but examples with SeqCst show strange situations, and I can’t shake this question out of my head.

1 Answer 1

6

Your specific use case wouldn't be guaranteed by ISO C++ with relaxed for (3) because without release/acquire synchronization, there's no happens-before.

But it's is safe in practice because Can the hardware reorder an atomic load followed by an atomic store, if the store is conditional on the load? - no, CPUs can't make speculative stores visible to other threads.


If head.load() is weaker than acquire, nothing in ISO C++ requires it to happen before new (&buffer[idx]) T(std::move(value));, so the abstract machine allows overwrite of a buffer entry that a reader hasn't finished reading yet, giving data-race UB. (e.g. its release-store only becomes visible to your head.load() sometime after you overwrite the entry it's reading, and branch prediction of the if is then confirmed.)

The Linux kernel memory model would make relaxed safe, because on real CPUs control dependencies block LoadStore reordering. if(stuff involving head_val){return false;} before new (&buffer[idx]) T(std::move(value)); is a control dependency of the store on the head load result.

Can a speculatively executed CPU branch contain opcodes that access RAM? - speculative stores only write into the store buffer, and can't become visible to other cores until after they're known to be non-speculative. (i.e. the store instruction retires, so the store-buffer entry "graduates" and is ready to commit.)

This is one of those cases of the ISO C++ memory model being so weak on paper it's not really plausible that any future CPU could do this kind of reordering. If performance is critical (especially on targets like ARMv7 where a relaxed load is a lot cheaper than acquire), would would be reasonable to document what's going on and use relaxed, including with a comment on the if being an important control dependency.

If performance isn't critical, probably best to just use head.load(acquire) so your code is formally guaranteed to work by ISO C++.


And BTW, yes, for tail which is only written by one thread, relaxed loads and release stores are fine. So is keeping a non-atomic copy of it and only storing updates. It's probably also fine to use std::atomic_ref to read it non-atomically and write it atomically. (So actually declare them non-atomic with alignas(std::atomic_ref<T>::required_alignment), and have the atomic accesses use an atomic_ref.)

It's probably fine in practice to do whatever tail.unsync_load() does to read a std::atomic variable, like memcpy if you've checked that sizeof(std::atomic<T> == sizeof(T)). Depending on the internals of std::atomic in the C++ standard library, it might not even be strict-aliasing UB to point a T* at std::atomic<T> and deref it, if the first class member is a T object.

Sign up to request clarification or add additional context in comments.

6 Comments

Weird coincidence that a second question which hinges on control dependencies blocking LoadStore reordering was asked within 1 day of a question about that, which was the first one I remember seeing on SO.
I've been reading so much literature lately that I can no longer find the source. As far as I remember, barriers prevent instruction reordering. Also, Release forces the processor to publish modified cache lines to memory, while Acquire forces it to load the data associated with the atomic. I'm not sure exactly how it determines which changes it needs to load. If this model is correct, then I don't necessarily need to use Acquire to establish order if I don't care about side effects. How accurate is this understanding?
@EugeneUsachev - A better mental model is that cache is coherent, and memory reordering is a local effect in each core's own accesses to cache. (Except ISAs which aren't multi-copy atomic like PowerPC also allow some other cores to see stores before they commit to cache and become globally visible). This is why acquire/release/full memory barriers only have to work locally by delaying (some) later operations until (some) earlier operations have completed. Not having to figure out which cache lines to publish. preshing.com/20120710/…
@EugeneUsachev: But C++'s memory model doesn't work in terms of reordering being allowed or not, it works in terms of happens-before restricting which values a load is allowed to see from the modification-order of each variable (and the coherence rules (eel.is/c++draft/intro.races#15) for the mod order for each location separately). And with seq_cst, a total order of all seq_cst ops which is consistent with program-order.
The C++ memory model doesn't have any notion of a coherent shared cache, other than the coherence rules, so some orderings allowed in the abstract machine are implausible on any real CPU design. That keeps the door open for possible future innovation in CPUs that work in ways we didn't anticipate, but can also mean you need to write source with stronger ordering that compiles to slow barriers for some ISAs. Often not a big problem, release and acquire are pretty good on the mainstream ISAs (x86 and AArch64, and RISC-V.)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.