I've been studying several implementations of SPMC (single producer, multiple consumer) ring buffers. In many of them, I find the memory orderings to be quite conservative—often stronger than what seems strictly necessary.
As far as I understand, atomic operations with Relaxed ordering are still globally visible, but memory orderings like Acquire and Release are mainly about synchronizing access to surrounding data.
Let me illustrate the situation with a simplified producer_pop method. The language used here is Rust, but the question is about memory ordering semantics, so you don’t need to know Rust specifically.
unsafe fn producer_pop(&self) -> Option<T> {
let mut head = self.head.load(Acquire);// To see the changes that are done before the head has been updated
let tail = unsafe { self.tail.unsync_load() }; // only the producer modifies tail
loop {
if head == tail {
return None;
}
// Read the value before CAS.
let maybe_value: MaybeUninit<T> = self.buffer_thin_ptr().add(head & MASK).read();
match self.head.compare_exchange_weak(
head,
head.wrapping_add(1),
Acquire, // success
Acquire // failure
) {
Ok(_) => return Some(maybe_value.assume_init()),
Err(new_head) => {
head = new_head;
// retry
// The value is discard here
}
}
}
}
The idea is:
I load
headwithAcquireto observe any memory written before it was incremented by another thread.I read the value at that position, assuming only one thread will eventually "own" it.
If the CAS fails, I retry and discard the value.
What I don't understand is: why is Acquire required on the successful CAS? I already did an Acquire-load and read the value. Wouldn't it be sufficient to use compare_exchange_weak(head, head.wrapping_add(1), Relaxed, Acquire) instead? The failure case still needs Acquire to see fresh state, but success?
I modified the code to use Relaxed for the success ordering in the CAS. So the line became:
self.head.compare_exchange_weak(head, head.wrapping_add(1), Relaxed, Acquire)
All my tests pass, but I know that passing tests don't prove correctness in memory models. Is this usage safe? Are there any subtleties I'm missing?
producer_pop? Pop from a stack or queue is taking an element out, something a consumer does, not a producer. So what is this, the producer side unpushing an element, discarding it from the queue? How does your queue work (what invariants does it maintain)? What hardware did you test on? x86 can't do atomic RMWs weaker than seq_cst (or pure load/store weaker than acq/rel except for compile-time reordering), and I'm not sure if ARMv8.1 single-instruction atomics support different success/fail orders.-fsanitize=thread, that can at least help catch some things, but probably not this.) You should add details of your test to the question. And that info about your queue design and whatproducer_popis. That helps a lot in understanding what this function is supposed to do.