2

Consider this example:

#include <iostream>
#include <atomic>
#include <thread>
#include <cassert>

int main(){
   std::atomic<int> val = 1;
   std::atomic<std::atomic<int>*> ptr = nullptr;
   auto t1 = std::thread([&](){
      auto v = val.load(std::memory_order::relaxed);
      while(true){
        if(v==-1){
           v = val.load(std::memory_order::relaxed);
           continue; 
        }
        if(val.compare_exchange_strong(v,v+1,std::memory_order::acquire,std::memory_order::relaxed)==true){
            break;
        }
      }
      ptr.store(&val,std::memory_order::relaxed);  // #1
   });
   auto t2 = std::thread([&ptr](){
      std::atomic<int>* val_ptr = ptr.load(std::memory_order::relaxed); // #2
      while(val_ptr==nullptr){
        val_ptr = ptr.load(std::memory_order::relaxed);  // #3
      }
      int v = 1;
      bool r = val_ptr->compare_exchange_strong(v,-1,std::memory_order::acquire,std::memory_order::relaxed); // #4
      /* 
      if(r){
        val_ptr->store(1,std::memory_order::release);
      } // #6
      */ 
      assert(r==false);  // #5
   });
   t1.join();
   t2.join();
}

Can the assertion at #5 never fail even though #1 and #2/#3 use relaxed order?

The assertion fails only if #4 returns true. This means #4 is successful, that is, the RMW operation loads 1 and writes -1. The modification whose value is 1 is only the initial value of the atomic object val.

In this program, all operations on the atomic object val are either RMW operations or pure loads (i.e., CAS failures). So all these operations are serialized according to [atomics.order] p10:

Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.

That is, no two RMW operations can read the same modification. If #4 reads the initial value, then the CAS operations must not succeed, and it's just a pure load.

Not sure if the following analysis is formal:

  1. The CAS in t1 won't be reached if v is -1, which causes the loop not to exit.
  2. CAS failures cause the loop not to exit, which brings it back to the condition above.
  3. The loop will exit only if the CAS loads 1 and writes 2 in this program.

However, the assumption is that #4 loads 1 and writes -1. If that happens, then the loop at t1 cannot exit, which means #1 cannot be reached.

The loop in t2 cannot exit if the load reads nullptr. So #5 can be reached only if the loop in t2 exists, which requires that there is some side effect that is non-null. In this program, the side effect producing a non-null pointer can only be #1. But with the above assumption, #1 is never reached. Therefore that side effect cannot exist anywhere in the program's lifetime.

Thus #2 and #3 cannot read the value written at #1, otherwise it would violate [intro.races] p10:

The value of an atomic object M, as determined by evaluation B, is the value stored by some unspecified side effect A that modifies M, where B does not happen before A.

So, the assertion can never fail. Is my analysis right?

Update:

How will the assertion behave when introducing the commented #6? IICU, if #5 fails, the loop at t1 can only exit when the CAS reads the value written by #6; however, at this execution, #6 will synchronize with the CAS at t1, which means #2 cannot read #1(#2 happens before #1), the loop at t2 cannot exit in this assumption too. The assertion still cannot fail because it won't be reached, right?

If we change the memory order in #6 to relaxed, it seems that the assertion can fail? Seems OOTA, but technically can happen.

14
  • 1
    Seems like all that's required is ptr.store(&val) (#1) becoming visible to t2 before t1's val.CAS_strong(v,v+1, acquire, relaxed) has done its load. It has acquire strength so normal implementations wouldn't reorder it with the later store, but there's no actual rel/acq synchronization so I don't think there's a happens-before anywhere. The actual C++ memory model is a lot weaker than sanity or causality requires. With no inter-thread happens-before, from t2's PoV, all the operations on different vars in t1 are potentially concurrent (and the loop is not infinite). Commented Nov 26 at 17:31
  • 1
    Oh, and the t2 CAS has to happen after val.load and check against -1 but before val.CAS. (The operations on a single variable are ordered wrt. each other by the coherency rule.) These are just my first thoughts after several minutes thinking about it; there might be an obstacle to this ordering. Commented Nov 26 at 17:34
  • 1
    @PeterCordes However, the two CAS operations in two threads can only have one thereof succeed. If #4 succeeds, then the CAS in t1 must fail and be a pure load returning -1; in this case, the loop in t1 cannot exit such that #1 is unreachable, which results in that the loop in t2 cannot exit because no side effect producing non-null pointers could be executed somewhere in the lifetime of the program. If the assertion fails, it would violate the observable behavior in terms of the abstract machine sense. Commented 2 days ago
  • @PeterCordes I feel this question might be a bit similar to this Commented 2 days ago
  • Oh right, that ordering (with t2 CAS success between t1 load and t2 CAS) would lead to t1 CAS failure and infinite loop so ptr.store(&val) isn't reached. And other orderings lead to t2 CAS failure, which means the assert succeeds. What was the point of the acquire ordering? Oh, I guess to sync with the commented-out release store, otherwise it's a red herring that's an unfortunate distraction for the original question. Commented 2 days ago

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.