Thread synchronization problem with c++ std::atomic variables

Question

The following program is giving me unexpected behavior when it's printing the "bad" output once in a while. The two threads are supposed to synchronize using the two std::atomic variables 's_lock1' and 's_lock2'. In func2, in order to set the 's_var' variable to 1, it must have atomically stored a non-zero value in 's_lock2' and the other thread (func1) must not have updated the 's_lock1' variable yet. However, somehow in func1 it's printing the unexpected "bad" output. The s_lock2.load() statement seems to return false instead. Is there something wrong with this code snippet? Is it an issue related to memory ordering?

I am running this on an 8-core Linux server with Centos 7 installed. Any help is greatly appreciated.

#include <iostream>
#include <thread>
#include <atomic>
#include <unistd.h>

std::atomic_uint s_lock1 = 0;
std::atomic_uint s_lock2 = 0;
std::atomic_uint s_var = 0;

static void func1()
{
    while (true) {
        s_lock1.store(1, std::memory_order_release);
        if (s_lock2.load(std::memory_order_acquire) != 0) {
            s_lock1.store(0, std::memory_order_release);
            continue;
        }
        if (s_var.load(std::memory_order_acquire) > 0) {
            printf("bad\n");
        }
        usleep(1000);
        s_lock1.store(0, std::memory_order_release);
    }
}

static void func2()
{
    while (true) {
        s_lock2.store(1, std::memory_order_release);
        if (s_lock1.load(std::memory_order_acquire) != 0) {
            s_lock2.store(0, std::memory_order_release);
            continue;
        }
        s_var.store(1, std::memory_order_release);
        usleep(5000);
        s_var.store(0, std::memory_order_release);
        s_lock2.store(0, std::memory_order_release);
    }
}

int main()
{
    std::thread t1(func1);
    std::thread t2(func2);
    t1.join();
    t2.join();
}

Your code seems to have a bunch of unexpected characters, what did you paste it from? — Fire Lancer
– Fire Lancer, Commented May 14, 2019 at 13:16

Maxim Egorushkin · Accepted Answer · 2019-05-14 17:14:58Z

This locking algorithm may break because of the store buffers in Intel CPUs: the stores do not go into level 1 cache directly but are queued in the store buffer for a while and hence are invisible to another CPU during that time:

To allow performance optimization of instruction execution, the IA-32 architecture allows departures from strong-ordering model called processor ordering in Pentium 4, Intel Xeon, and P6 family processors. These processor-ordering variations (called here the memory-ordering model) allow performance enhancing operations such as allowing reads to go ahead of buffered writes. The goal of any of these variations is to increase instruction execution speeds, while maintaining memory coherency, even in multiple-processor systems.

The store buffers need to be flushed for this locking to work by using std::memory_order_seq_cst for stores to locks (the default memory order for loads and stores, you can just do s_lock1 = 1;, for example). std::memory_order_seq_cst for stores causes the compiler generate xchg instruction or insert mfence instruction after the store, both of which make the effect of the store visible to other CPUs:

Atomic operations tagged memory_order_seq_cst not only order memory the same way as release/acquire ordering (everything that happened-before a store in one thread becomes a visible side effect in the thread that did a load), but also establish a single total modification order of all atomic operations that are so tagged. Sequential ordering may be necessary for multiple producer-multiple consumer situations where all consumers must observe the actions of all producers occurring in the same order. Total sequential ordering requires a full memory fence CPU instruction on all multi-core systems. This may become a performance bottleneck since it forces the affected memory accesses to propagate to every core.

Working example:

std::atomic<unsigned> s_lock1{0};
std::atomic<unsigned> s_lock2{0};
std::atomic<unsigned> s_var{0};

void func1() {
    while(true) {
        s_lock1.store(1, std::memory_order_seq_cst);
        if(s_lock2.load(std::memory_order_seq_cst) != 0) {
            s_lock1.store(0, std::memory_order_seq_cst);
            continue;
        }
        if(s_var.load(std::memory_order_relaxed) > 0) {
            printf("bad\n");
        }
        usleep(1000);
        s_lock1.store(0, std::memory_order_seq_cst);
    }
}

void func2() {
    while(true) {
        s_lock2.store(1, std::memory_order_seq_cst);
        if(s_lock1.load(std::memory_order_seq_cst) != 0) {
            s_lock2.store(0, std::memory_order_seq_cst);
            continue;
        }
        s_var.store(1, std::memory_order_relaxed);
        usleep(5000);
        s_var.store(0, std::memory_order_relaxed);
        s_lock2.store(0, std::memory_order_seq_cst);
    }
}

int main() {
    std::thread t1(func1);
    std::thread t2(func2);
    t1.join();
    t2.join();
}

To prevent #StoreLoad reordering, the loads have to be tagged mo_seq_cst as well

Collectives™ on Stack Overflow

Thread synchronization problem with c++ std::atomic variables

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related