6

The following program is giving me unexpected behavior when it's printing the "bad" output once in a while. The two threads are supposed to synchronize using the two std::atomic variables 's_lock1' and 's_lock2'. In func2, in order to set the 's_var' variable to 1, it must have atomically stored a non-zero value in 's_lock2' and the other thread (func1) must not have updated the 's_lock1' variable yet. However, somehow in func1 it's printing the unexpected "bad" output. The s_lock2.load() statement seems to return false instead. Is there something wrong with this code snippet? Is it an issue related to memory ordering?

I am running this on an 8-core Linux server with Centos 7 installed. Any help is greatly appreciated.

#include <iostream>
#include <thread>
#include <atomic>
#include <unistd.h>

std::atomic_uint s_lock1 = 0;
std::atomic_uint s_lock2 = 0;
std::atomic_uint s_var = 0;

static void func1()
{
    while (true) {
        s_lock1.store(1, std::memory_order_release);
        if (s_lock2.load(std::memory_order_acquire) != 0) {
            s_lock1.store(0, std::memory_order_release);
            continue;
        }
        if (s_var.load(std::memory_order_acquire) > 0) {
            printf("bad\n");
        }
        usleep(1000);
        s_lock1.store(0, std::memory_order_release);
    }
}

static void func2()
{
    while (true) {
        s_lock2.store(1, std::memory_order_release);
        if (s_lock1.load(std::memory_order_acquire) != 0) {
            s_lock2.store(0, std::memory_order_release);
            continue;
        }
        s_var.store(1, std::memory_order_release);
        usleep(5000);
        s_var.store(0, std::memory_order_release);
        s_lock2.store(0, std::memory_order_release);
    }
}

int main()
{
    std::thread t1(func1);
    std::thread t2(func2);
    t1.join();
    t2.join();
}
1
  • 1
    Your code seems to have a bunch of unexpected characters, what did you paste it from? Commented May 14, 2019 at 13:16

1 Answer 1

6

This locking algorithm may break because of the store buffers in Intel CPUs: the stores do not go into level 1 cache directly but are queued in the store buffer for a while and hence are invisible to another CPU during that time:

To allow performance optimization of instruction execution, the IA-32 architecture allows departures from strong-ordering model called processor ordering in Pentium 4, Intel Xeon, and P6 family processors. These processor-ordering variations (called here the memory-ordering model) allow performance enhancing operations such as allowing reads to go ahead of buffered writes. The goal of any of these variations is to increase instruction execution speeds, while maintaining memory coherency, even in multiple-processor systems.

The store buffers need to be flushed for this locking to work by using std::memory_order_seq_cst for stores to locks (the default memory order for loads and stores, you can just do s_lock1 = 1;, for example). std::memory_order_seq_cst for stores causes the compiler generate xchg instruction or insert mfence instruction after the store, both of which make the effect of the store visible to other CPUs:

Atomic operations tagged memory_order_seq_cst not only order memory the same way as release/acquire ordering (everything that happened-before a store in one thread becomes a visible side effect in the thread that did a load), but also establish a single total modification order of all atomic operations that are so tagged. Sequential ordering may be necessary for multiple producer-multiple consumer situations where all consumers must observe the actions of all producers occurring in the same order. Total sequential ordering requires a full memory fence CPU instruction on all multi-core systems. This may become a performance bottleneck since it forces the affected memory accesses to propagate to every core.

Working example:

std::atomic<unsigned> s_lock1{0};
std::atomic<unsigned> s_lock2{0};
std::atomic<unsigned> s_var{0};

void func1() {
    while(true) {
        s_lock1.store(1, std::memory_order_seq_cst);
        if(s_lock2.load(std::memory_order_seq_cst) != 0) {
            s_lock1.store(0, std::memory_order_seq_cst);
            continue;
        }
        if(s_var.load(std::memory_order_relaxed) > 0) {
            printf("bad\n");
        }
        usleep(1000);
        s_lock1.store(0, std::memory_order_seq_cst);
    }
}

void func2() {
    while(true) {
        s_lock2.store(1, std::memory_order_seq_cst);
        if(s_lock1.load(std::memory_order_seq_cst) != 0) {
            s_lock2.store(0, std::memory_order_seq_cst);
            continue;
        }
        s_var.store(1, std::memory_order_relaxed);
        usleep(5000);
        s_var.store(0, std::memory_order_relaxed);
        s_lock2.store(0, std::memory_order_seq_cst);
    }
}

int main() {
    std::thread t1(func1);
    std::thread t2(func2);
    t1.join();
    t2.join();
}
Sign up to request clarification or add additional context in comments.

1 Comment

To prevent #StoreLoad reordering, the loads have to be tagged mo_seq_cst as well

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.