Consider this example:
#include <iostream>
#include <atomic>
#include <thread>
#include <cassert>
int main(){
std::atomic<int> val = 1;
std::atomic<std::atomic<int>*> ptr = nullptr;
auto t1 = std::thread([&](){
auto v = val.load(std::memory_order::relaxed);
while(true){
if(v==-1){
v = val.load(std::memory_order::relaxed);
continue;
}
if(val.compare_exchange_strong(v,v+1,std::memory_order::acquire,std::memory_order::relaxed)==true){
break;
}
}
ptr.store(&val,std::memory_order::relaxed); // #1
});
auto t2 = std::thread([&ptr](){
std::atomic<int>* val_ptr = ptr.load(std::memory_order::relaxed); // #2
while(val_ptr==nullptr){
val_ptr = ptr.load(std::memory_order::relaxed); // #3
}
int v = 1;
bool r = val_ptr->compare_exchange_strong(v,-1,std::memory_order::acquire,std::memory_order::relaxed); // #4
/*
if(r){
val_ptr->store(1,std::memory_order::release);
} // #6
*/
assert(r==false); // #5
});
t1.join();
t2.join();
}
Can the assertion at #5 never fail even though #1 and #2/#3 use relaxed order?
The assertion fails only if #4 returns true. This means #4 is successful, that is, the RMW operation loads 1 and writes -1. The modification whose value is 1 is only the initial value of the atomic object val.
In this program, all operations on the atomic object val are either RMW operations or pure loads (i.e., CAS failures). So all these operations are serialized according to [atomics.order] p10:
Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.
That is, no two RMW operations can read the same modification. If #4 reads the initial value, then the CAS operations must not succeed, and it's just a pure load.
Not sure if the following analysis is formal:
- The CAS in
t1won't be reached ifvis-1, which causes the loop not to exit. - CAS failures cause the loop not to exit, which brings it back to the condition above.
- The loop will exit only if the CAS loads
1and writes2in this program.
However, the assumption is that #4 loads 1 and writes -1. If that happens, then the loop at t1 cannot exit, which means #1 cannot be reached.
The loop in t2 cannot exit if the load reads nullptr. So #5 can be reached only if the loop in t2 exists, which requires that there is some side effect that is non-null. In this program, the side effect producing a non-null pointer can only be #1. But with the above assumption, #1 is never reached. Therefore that side effect cannot exist anywhere in the program's lifetime.
Thus #2 and #3 cannot read the value written at #1, otherwise it would violate [intro.races] p10:
The value of an atomic object M, as determined by evaluation B, is the value stored by some unspecified side effect A that modifies M, where B does not happen before A.
So, the assertion can never fail. Is my analysis right?
Update:
How will the assertion behave when introducing the commented #6? IICU, if #5 fails, the loop at t1 can only exit when the CAS reads the value written by #6; however, at this execution, #6 will synchronize with the CAS at t1, which means #2 cannot read #1(#2 happens before #1), the loop at t2 cannot exit in this assumption too. The assertion still cannot fail because it won't be reached, right?
If we change the memory order in #6 to relaxed, it seems that the assertion can fail? Seems OOTA, but technically can happen.
ptr.store(&val)(#1) becoming visible to t2 before t1'sval.CAS_strong(v,v+1, acquire, relaxed)has done its load. It has acquire strength so normal implementations wouldn't reorder it with the later store, but there's no actual rel/acq synchronization so I don't think there's a happens-before anywhere. The actual C++ memory model is a lot weaker than sanity or causality requires. With no inter-thread happens-before, from t2's PoV, all the operations on different vars in t1 are potentially concurrent (and the loop is not infinite).val.loadand check against-1but beforeval.CAS. (The operations on a single variable are ordered wrt. each other by the coherency rule.) These are just my first thoughts after several minutes thinking about it; there might be an obstacle to this ordering.CASoperations in two threads can only have one thereof succeed. If#4succeeds, then theCASint1must fail and be a pure load returning-1; in this case, the loop int1cannot exit such that#1is unreachable, which results in that the loop int2cannot exit because no side effect producing non-null pointers could be executed somewhere in the lifetime of the program. If the assertion fails, it would violate the observable behavior in terms of the abstract machine sense.ptr.store(&val)isn't reached. And other orderings lead to t2 CAS failure, which means the assert succeeds. What was the point of theacquireordering? Oh, I guess to sync with the commented-outreleasestore, otherwise it's a red herring that's an unfortunate distraction for the original question.