You definitely have to include something. The code as it stands is clearly incorrect. There is nothing at all ensuring that the reads of your TwoInts object *s in HandleSignal will happen-before its destruction in s.reset(), so this is a data race and the behavior is undefined.
At the level of the implementation, yes, the compiler is perfectly free to reorder the store to t_two_ints with s.reset().
It would be nice if we could fix this by simply putting a fence between t_two_ints.store() and s.reset(). On a typical implementation, a two-way compiler-only barrier like GCC's asm("" : : : "memory"); would suffice, because a signal handler executes in its entirety between two instructions of the interrupted code. So this would have the semantics of "the entire signal handler either happens-before or happens-after the barrier". That would solve our problem: if the signal handler happens-before the barrier, then all its accesses to *s happen-before s.reset(); and if it happens-after, then it observes the store of nullptr to t_two_ints and so does not access our *s at all.
Unfortunately, as far as I can tell, ISO C++ doesn't formally give us a signal fence with those semantics. atomic_signal_fence is simply defined as having the same semantics as atomic_thread_fence, but only between a handler and its interrupted thread [atomic.fences p6; all citations are to C++23 draft N4950]. This implies, I believe, that we can only get the necessary happens-before through synchronization, which requires an actual acquire load observing the value of a release store (or the equivalent with relaxed load/store and acquire/release fences).
So the lowest-cost correct version I could come up with is the following. Note that we prefer to use relaxed operations together with signal fences, which are more verbose but should impose no runtime cost.
static thread_local std::atomic<const TwoInts*> t_two_ints;
static thread_local std::atomic<bool> in_handler;
void Foo() {
std::optional s = TwoInts{17, 19}; // #1
std::atomic_signal_fence(std::memory_order_release); // #2
t_two_ints.store(&*s, std::memory_order_relaxed); // #3
// Do some other work…
t_two_ints.store(nullptr, std::memory_order_relaxed); // #4
std::atomic_signal_fence(std::memory_order_seq_cst); // #5
while (in_handler.load(std::memory_order_relaxed)) { // #6
[[unlikely]]; // should never happen
}
std::atomic_signal_fence(std::memory_order_acquire); // #7
s.reset(); // #8
}
void HandleSignal() {
in_handler.store(true, std::memory_order_relaxed); // #9
std::atomic_signal_fence(std::memory_order_seq_cst); // #10
const TwoInts* const s =
t_two_ints.load(std::memory_order_relaxed); // #11
std::atomic_signal_fence(std::memory_order_acquire); // #12
if (s != nullptr && s->b != s->a + 2) { // #13
std::quick_exit(0);
}
std::atomic_signal_fence(std::memory_order_release); // #14
in_handler.store(false, std::memory_order_relaxed); // #15
}
Try on Godbolt
In the handler, this adds two plain stores, which should have minimal cost because they can be buffered. In Foo(), it adds one plain load, which (if the handler ran) should be hot in cache or store-forwarded, and a conditional branch, which is never taken and should be predicted well.
This assumes that HandleSignal isn't reentrant; that (through some implementation-defined mechanism) its signal is blocked while it's executing, so that executions of HandleSignal are totally ordered by happens-before. If it is reentrant, we would need some more changes, such as replacing the boolean in_handler with a counter which is atomically incremented and decremented on entry and exit to HandleSignal.
Note I've replaced the acquire load in HandleSignal() with a relaxed load and a fence, as already suggested in the comment by Peter Cordes. This is preferable as noted above. So fences #2 and #12 serve to prevent a data race between #1 and #13.
Now let's prove that it avoids data races between #13 and #8. Take an arbitrary execution F of Foo(), and an arbitrary execution H of HandleSignal(). Since seq_cst fences participate in a total order S [atomics.order p4], there are two cases: either fence #5 in Foo() precedes fence #10 in HandleSignal() or vice versa.
Suppose fence #5 precedes fence #10. I claim that load #11 must either return nullptr as stored by #4, or else some value later in the modification order [intro.races p4] of t_two_ints. For if not, then by [atomics.order p3.3], taking A = #11, B = #4, and X = #3 or any earlier store, we have that #11 is coherence-ordered-before #4. Since #10 happens-before #11 and #4 happens-before #5 (by sequencing), then [atomics.order p4.4] (taking X = #10, A = #11, B = #4, Y = #5) implies that #10 precedes #5, a contradiction.
So #11 returns either nullptr or a later value than #4. In the latter case, the value in particular cannot be that stored by #3, because #3 is sequenced-before #4, thus happens-before #4 [intro.races p10.1], thus precedes #4 in the modification order [intro.races p15]. So in either case, the value loaded by #11 does not point to our *s, and so there is no data race between #13 and #8 as they operate on different objects.
Suppose instead that fence #10 precedes fence #5.
Consider load #6 (its last iteration in the loop, if there's more than one, which should not actually happen). It returns false. I claim that this false value was stored either by #15 (case 2.1), or by some store later in the modification order of in_handler (case 2.2). If not, then since the value returned by #6 cannot be the value stored by #9 (which was true), it must necessarily be a value earlier still in the modification order. (Recall that modification order is consistent with sequencing order, by [intro.races p15].) That means, by the same argument as before, that #6 is coherence-ordered-before #9 [atomics.order p3.3 again] and so fence #5 precedes fence #10, contrary to our supposition.
If #6 returns the value stored by #15, then by [atomics.fences p2] (taking A=#14, B=#7, X=#15, Y=#6) we have that #14 synchronizes with #7. Moreover, #7 is sequenced-before #8, so by [intro.races p9.3.1], #14 inter-thread-happens-before #8. And #13 is sequenced-before #14, so by [intro.races p9.3.2], #13 inter-thread-happens-before #8. By [intro.races p10.2], #13 thus happens-before #8, and there is no data race [intro.races p21].
If #6 returns a value from a modification X later in the modification order than #15, then X must be some #15' from a different execution H' of HandleSignal. By our non-reentrancy assumption, one of these executions happens-before the other. But by [intro.races p15], #15' does not happen-before #15, so it must be that H happens-before H'. We also have as before that the corresponding fence #14' happens-before (indeed, inter-thread-happens-before) #8.
We would now like to conclude using transitivity of happens-before, but unfortunately, in C++23 and earlier, happens-before is not defined as transitive due to the possibility of consume operations. So instead, note that if H happens-before H', so that in particular #13 happens-before #14', then by [intro.races p10] either #13 is sequenced-before #14' (if such a thing is possible for signal handlers) or #13 inter-thread-happens-before #14'. In either case, using [intro.races p9.3.2] or [p9.3.3] respectively, #13 inter-thread-happens-before #8, and so #13 happens-before #8. Again, there is no data race.
Q.E.D.
signal_fence(acquire), which costs nothing in the asm. It's kind of odd to do that optimization for the store but not the load, since both are pure win for performance and pure downside for readability; might as well do both or neither. But yes it's still correct to do it this way.signal_fence(release)orsignal_fence(seq_cst)after thestore(nullptr), before.reset. I think C++ doesn't actually guarantee anything about that since thes.reset()isn't an atomic operation, but in practice on real compilers, fences will give compile-time ordering even between non-atomic operations. This is a similar problem to a SeqLock (although that has the additional challenge of potentially-concurrent access to the payload and then ignoring possible tearing, which in ISO C++ is fully UB so nothing is guaranteed.)t_two_intsasthread_localand access it through a signal handler. Maybe you checked it for that, but to me, it is not at all obvious that the signal handling will occur in the same thread asFoo.Foo.