2

I've been studying several implementations of SPMC (single producer, multiple consumer) ring buffers. In many of them, I find the memory orderings to be quite conservative—often stronger than what seems strictly necessary.

As far as I understand, atomic operations with Relaxed ordering are still globally visible, but memory orderings like Acquire and Release are mainly about synchronizing access to surrounding data.

Let me illustrate the situation with a simplified producer_pop method. The language used here is Rust, but the question is about memory ordering semantics, so you don’t need to know Rust specifically.

unsafe fn producer_pop(&self) -> Option<T> {
    let mut head = self.head.load(Acquire);// To see the changes that are done before the head has been updated 
    let tail = unsafe { self.tail.unsync_load() }; // only the producer modifies tail

    loop {
        if head == tail {
            return None;
        }

        // Read the value before CAS.
        let maybe_value: MaybeUninit<T> = self.buffer_thin_ptr().add(head & MASK).read();

        match self.head.compare_exchange_weak(
            head,
            head.wrapping_add(1),
            Acquire, // success
            Acquire  // failure
        ) {
            Ok(_) => return Some(maybe_value.assume_init()),
            Err(new_head) => {
                head = new_head;
                // retry

               // The value is discard here
            }
        }
    }
}

The idea is:

  • I load head with Acquire to observe any memory written before it was incremented by another thread.

  • I read the value at that position, assuming only one thread will eventually "own" it.

  • If the CAS fails, I retry and discard the value.

What I don't understand is: why is Acquire required on the successful CAS? I already did an Acquire-load and read the value. Wouldn't it be sufficient to use compare_exchange_weak(head, head.wrapping_add(1), Relaxed, Acquire) instead? The failure case still needs Acquire to see fresh state, but success?

I modified the code to use Relaxed for the success ordering in the CAS. So the line became:

self.head.compare_exchange_weak(head, head.wrapping_add(1), Relaxed, Acquire)

All my tests pass, but I know that passing tests don't prove correctness in memory models. Is this usage safe? Are there any subtleties I'm missing?

8
  • producer_pop? Pop from a stack or queue is taking an element out, something a consumer does, not a producer. So what is this, the producer side unpushing an element, discarding it from the queue? How does your queue work (what invariants does it maintain)? What hardware did you test on? x86 can't do atomic RMWs weaker than seq_cst (or pure load/store weaker than acq/rel except for compile-time reordering), and I'm not sure if ARMv8.1 single-instruction atomics support different success/fail orders. Commented Jul 25 at 19:24
  • @PeterCordes This is the pop method for the task SPMC queue. In this model, each worker can push and pop from their own queue, and only perform pop_many on other queues, so this design is intentional. I'm using Rust and I'm uncertain about how exactly it compiles these orderings. However, I'm not sure if I should focus on that aspect, as the memory ordering model is what really matters here. My main question is whether this approach is permissible, or if I'm required to use stronger memory orderings instead. Commented Jul 25 at 20:01
  • I understand that you're asking about whether it's safe on paper. The reason I ask about hardware / compilers is that your testing could be an interesting data point if you tested on a weakly-ordered ISA, but certainly tells us nothing if you tested on x86. (Also, if rustc has anything like GCC or Clang -fsanitize=thread, that can at least help catch some things, but probably not this.) You should add details of your test to the question. And that info about your queue design and what producer_pop is. That helps a lot in understanding what this function is supposed to do. Commented Jul 25 at 20:32
  • Also, you say this is a simplified version of existing code. You should link the original. Commented Jul 25 at 20:36
  • @PeterCordes I appreciate your willingness to help and the detailed questions. However, linking the original code or adding testing details might complicate the question unnecessarily, as I'm not trying to debug a specific program. My interest here is purely theoretical. I want to understand the programming model better. The core of my question is about memory ordering semantics: can I safely use Acquire for the initial load (to ensure correct data is read) followed by a Relaxed CAS for the successful case because the data is unchanged and already read? Commented Jul 25 at 21:31

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.