1

Please consider the following synchronization problem:

initially:
    version = 0           // atomic variable
    data = 0              // normal variable (there could be many)

Thread A:
    version++
    data = 3

Thread B:
    d = data
    v = version
    assert(d != 3 || v == 1)

Basically, if thread B sees data = 3 then it must also see version++.

What's the weakest memory order and synchronization we must impose so that the assertion in thread B is always satisfied?

If I understand C++ memory_order correctly, the release-acquire ordering won't do because that guarantees that operations BEFORE version++, in thread A, will be seen by the operations AFTER v = version, in thread B.

Acquire and release fences also work in the same directions, but are more general.

As I said, I need the other direction: B sees data = 3 implies B sees version = 1.

I'm using this "versioning approach" to avoid locks as much as possible in a data structure I'm designing. When I see something has changed, I step back, read the new version and try again.

I'm trying to keep my code as portable as possible, but I'm targeting x86-64 CPUs.

1
  • You could have an array of data values with the version being an index into that array Commented Mar 28, 2022 at 6:12

1 Answer 1

1

You might be looking for a SeqLock, as long as your data doesn't include pointers. (If it does, then you might need something more like RCU to protect readers that might load a pointer, stall / sleep for a while, then deref that pointer much later.)

You can use the SeqLock sequence counter as the version number. (version = tmp_counter >> 1 since you need two increments per write of the payload to let readers detect tearing when reading the non-atomic data. And to make sure they see the data that goes with this sequence number. Make sure you don't read the atomic counter a 3rd time; use the local tmp that you read it into to verify match before/after copying data.)

Readers will have to retry if they happen to attempt a read while data is being modified. But it's non-atomic, so there's no way if thread B sees data = 3 can ever be part of what creates synchronization; it can only be something you see as a result of synchronizing with a version number from the writer.

See:

  • Implementing 64 bit atomic counter with 32 bit atomics - my attempt at a SeqLock in C++, with lots of comments. It's a bit of a hack because ISO C++'s data-race UB rules are overly strict; a SeqLock relies on detecting possible tearing and not using torn data, rather than avoiding concurrent access entirely. That's fine on a machine without hardware race detection so that doesn't fault (like all real CPUs), but C++ still calls that UB, even with volatile (although that puts it more into implementation-defined territory). In practice it's fine.

  • GCC reordering up across load with `memory_order_seq_cst`. Is this allowed? - A GCC bug fixed in 8.1 that could break a seqlock implementation.

If you have multiple writers, you can use the sequence-counter itself as a spinlock for mutual exclusion between writers. e.g. using an atomic_fetch_or or CAS to attempt to set the low bit to make the counter odd. (tmp = seq.fetch_or(1, std::memory_order_acq_rel);, hopefully compiling to x86 lock bts). If it previously didn't have the low bit set, this writer won the race, but if it did then you have to try again.

But with only a single writer, you don't need to RMW the atomic sequence counter, just store new values (ordered with writes to the payload), so you can either keep a local copy of it, or just do a relaxed load of it, and store tmp+1 and tmp+2.

Sign up to request clarification or add additional context in comments.

17 Comments

Thanks. Indeed, I reinvented SeqLock (although in a slightly more general form), but my analysis was flawed because I forgot that if I read a version with the lock bit set, then I'll retry again anyway, so what's matters is what happens when I read the result of the second update to the version. Since the update to the data comes before it, everything works out perfectly.
I have a lingering doubt. Since the synchronization with the second counter update is the only one that matters, can't we relax the first update?
@Kiuhnm: Not exactly. You should use mo_relaxed for it, but you need a std::atomic_thread_fence(mo_release) after it to make sure none of the updates to data can be visible before the first store to the sequence number, the one that makes it odd and lets all readers know that we're mid-update. (Just a seq.store(val, release) is only one-way ordering in the wrong direction. Same for seq.store(val, seq_cst) - AArch64 can implement that with stlr which only has one-way ordering except wrt. later ldar.)
Technically I'm not totally sure that a release fence guarantees ordering between an atomic store and later non-atomic stores; that may be an implementation detail. But as I said, a SeqLock already relies on data-race UB, unless you make the payload atomic with mo_relaxed but that would defeat a compiler's ability to copy it efficiently with SIMD, for example. This is all stuff that I think I mentioned in Implementing 64 bit atomic counter with 32 bit atomics - if I didn't, let me know; been a while since I wrote it.
I see a contradiction in what you say. A release fence makes sure that stores BEFORE it are visible, not after. Otherwise, that would solve my initial question: if thread B sees "data=3" then it also sees "version=1" because a fence would make sure that data=3 is not visible before "version=1". But we know that that's FALSE. That's not how a release fence works.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.