Condition variable's "wait" function causing unexpected behaviour when predicate is provided

Question

As an educational exercise I'm implementing a thread pool using condition variables. A controller thread creates a pool of threads that wait on a signal (an atomic variable being set to a value above zero). When signaled the threads wake, perform their work, and when the last thread is done it signals the main thread to awaken. The controller thread blocks until the last thread is complete. The pool is then available for subsequent re-use.

Every now and then I was getting a timeout on the controller thread waiting for the worker to signal completion (likely because of a race condition when decrementing the active work counter), so in an attempt to solidify the pool I replaced the "wait(lck)" form of the condition variable's wait method with "wait(lck, predicate)". Since doing this, the behaviour of the thread pool is such that it seems to permit decrementing of the active work counter below 0 (which is the condition for reawakening the controller thread) - I have a race condition. I've read countless articles on atomic variables, synchronisation, memory ordering, spurious and lost wakeups on stackoverflow and various other sites, have incorporated what I've learnt to the best of my ability, and still cannot for the life of me work out why the way I've coded the predicated wait just does not work. The counter should only ever be as high as the number of threads in the pool (say, 8) and as low as zero. I've started losing faith in myself - it just shouldn't be this hard to do something fundamentally simple. There is clearly something else I need to learn here :)

Considering of course that there was a race condition I ensured that the two variables that drive the awakening and termination of the pool are both atomic, and that both are only ever changed while protected with a unique_lock. Specifically, I made sure that when a request to the pool was launched, the lock was acquired, the active thread counter was changed from 0 to 8, unlocked the mutex, and then "notified_all". The controller thread would only then be awakened with the active thread count at zero, once the last worker thread decremented it that far and "notified_one".

In the worker thread, the condition variable would wait and wake only when the active thread count is greater than zero, unlock the mutex, in parallel proceed to execute the work preassigned to the processor when the pool was created, re-acquire the mutex, and atomically decrement the active thread count. It would then, while still supposedly protected by the lock, test if it was the last thread still active, and if so, again unlock the mutex and "notify_one" to awaken the controller.

The problem is - the active thread counter repeatedly proceeds below zero after even only 1 or 2 iterations. If I test the active thread count at the start of a new workload, I could find the active thread count down around -6 - it is as if the pool was allowed to reawaken the controller thread before the work was completed.

Given that the thread counter and terminate flag are both atomic variables and are only ever modified while under the protection of the same mutex, I am using sequential memory ordering for all updates, I just cannot see how this is happening and I'm lost.

#include <stdafx.h>
#include <Windows.h>

#include <iostream>
#include <thread>
using std::thread;
#include <mutex>
using std::mutex;
using std::unique_lock;
#include <condition_variable>
using std::condition_variable;
#include <atomic>
using std::atomic;
#include <chrono>
#include <vector>
using std::vector;

class IWorkerThreadProcessor
{
public:
    virtual void Process(int) = 0;
};


class MyProcessor : public IWorkerThreadProcessor
{
    int index_ = 0;
public:
    MyProcessor(int index)
    {
        index_ = index;
    }

    void Process(int threadindex)
    {
    for (int i = 0; i < 5000000; i++);
        std::cout << '(' << index_ << ':' << threadindex << ") ";
    }
};


#define MsgBox(x) do{ MessageBox(NULL, x, L"", MB_OK ); }while(false)


class ThreadPool
{
private:
    atomic<unsigned int> invokations_ = 0;

     //This goes negative when using the wait_for with predicate
    atomic<int> threadsActive_ = 0;
    atomic<bool> terminateFlag_ = false;
    vector<std::thread> threads_;
    atomic<unsigned int> poolSize_ = 0;

    mutex mtxWorker_;
    condition_variable cvSignalWork_;
    condition_variable cvSignalComplete_;

public:

    ~ThreadPool()
    {
        TerminateThreads();
    }

    void Init(std::vector<IWorkerThreadProcessor*>& processors)
    {
        unique_lock<mutex> lck2(mtxWorker_);
        threadsActive_ = 0;
        terminateFlag_ = false;

        poolSize_ = processors.size();
        for (int i = 0; i < poolSize_; ++i)
            threads_.push_back(thread(&ThreadPool::launchMethod, this, processors[i], i));
    }

    void ProcessWorkload(std::chrono::milliseconds timeout)
    {
        //Only used to see how many invocations I was getting through before experiencing the issue - sadly it's only one or two
        invocations_++;
        try
        {
        unique_lock<mutex> lck(mtxWorker_);

        //!!!!!! If I use the predicated wait this break will fire !!!!!!
        if (threadsActive_.load() != 0)
        __debugbreak();

        threadsActive_.store(poolSize_);
        lck.unlock();
        cvSignalWork_.notify_all();

        lck.lock();
        if (!cvSignalComplete_.wait_for(
                lck,
                timeout,
                [this] { return threadsActive_.load() == 0; })
                )
        {
            //As you can tell this has taken me through a journey trying to characterise the issue...
            if (threadsActive_ > 0)
                MsgBox(L"Thread pool timed out with still active threads");
            else if (threadsActive_ == 0)
                MsgBox(L"Thread pool timed out with zero active threads");
            else
                MsgBox(L"Thread pool timed out with negative active threads");
            }
        }
        catch (std::exception e)
        {
            __debugbreak();
        }
    }

    void launchMethod(IWorkerThreadProcessor* processor, int threadIndex)
    {
        do
        {
            unique_lock<mutex> lck(mtxWorker_);

            //!!!!!! If I use this predicated wait I see the failure !!!!!!
            cvSignalWork_.wait(
                lck,
                [this] {
                return
                    threadsActive_.load() > 0 ||
                    terminateFlag_.load();
            });


            //!!!!!!!! Does not cause the failure but obviously will not handle
            //spurious wake-ups !!!!!!!!!!
            //cvSignalWork_.wait(lck);

            if (terminateFlag_.load())
                return;

            //Unlock to parallelise the work load
            lck.unlock();
            processor->Process(threadIndex);

            //Re-lock to decrement the work count
            lck.lock();
            //This returns the value before the subtraction so theoretically if the previous value was 1 then we're the last thread going and we can now signal the controller thread to wake.  This is the only place that the decrement happens so I don't know how it could possibly go negative
            if (threadsActive_.fetch_sub(1, std::memory_order_seq_cst) == 1)
            {
                lck.unlock();
                cvSignalComplete_.notify_one();
            }
            else
                lck.unlock();

        } while (true);
    }

    void TerminateThreads()
    {
        try
        {
            unique_lock<mutex> lck(mtxWorker_);
            if (!terminateFlag_)
            {
                terminateFlag_ = true;
                lck.unlock();
                cvSignalWork_.notify_all();

                for (int i = 0; i < threads_.size(); i++)
                    threads_[i].join();
            }
        }
        catch (std::exception e)
        {
            __debugbreak();
        }
    }
};


int main()
{
    std::vector<IWorkerThreadProcessor*> processors;
    for (int i = 0; i < 8; i++)
        processors.push_back(new MyProcessor(i));


    std::cout << "Instantiating thread pool\n";
    auto pool = new ThreadPool;
    std::cout << "Initialisting thread pool\n";
    pool->Init(processors);
    std::cout << "Thread pool initialised\n";

    for (int i = 0; i < 200; i++)
    {
        std::cout << "Workload " << i << "\n";
        pool->ProcessWorkload(std::chrono::milliseconds(500));
        std::cout << "Workload " << i << " complete." << "\n";
    }

    for (auto a : processors)
        delete a;

    delete pool;

    return 0;
}

What is the output that you see? Without the std::cout << '(' << index_ << ':' << threadindex << ") "; — Sahil Dhoked
– Sahil Dhoked, Commented Feb 6, 2019 at 4:29
Under "normal" circumstances, without the output from inside the processor instances, the output would just be the messages showing instantiation and completion of the workloads. The problem is that after only a single iteration the application breaks (on a _debugbreak()) in response to the condition "threadsActive.load() != 0" at around line 84, so there is no "output" per-se, just the breakpoint being hit (needless to say that is only there as a debugging aid). — Smileyofoz
– Smileyofoz, Commented Feb 6, 2019 at 21:13
(contd) This shows me that at the point I invoke the second (or subsequent) workload the thread count is already starting at a non-zero value, which should not be possible as the pool is supposed to return to zero active threads on completion of the previous request. — Smileyofoz
– Smileyofoz, Commented Feb 6, 2019 at 21:13

Smileyofoz · Accepted Answer · 2019-02-07 02:26:09Z

class ThreadPool
{
private:
    atomic<unsigned int> invokations_ = 0;
    std::atomic<unsigned int> awakenings_ = 0;
    std::atomic<unsigned int> startedWorkloads_ = 0;
    std::atomic<unsigned int> completedWorkloads_ = 0;
    atomic<bool> terminate_ = false;
    atomic<bool> stillFiring_ = false;
    vector<std::thread> threads_;
    atomic<unsigned int> poolSize_ = 0;

    mutex mtx_;
    condition_variable cvSignalWork_;
    condition_variable cvSignalComplete_;

public:

    ~ThreadPool()
    {
        TerminateThreads();
    }


    void Init(std::vector<IWorkerThreadProcessor*>& processors)
    {
        unique_lock<mutex> lck2(mtx_);
        //threadsActive_ = 0;
        terminate_ = false;

        poolSize_ = processors.size();
        for (int i = 0; i < poolSize_; ++i)
            threads_.push_back(thread(&ThreadPool::launchMethod, this, processors[i], i));

        awakenings_ = 0;
        completedWorkloads_ = 0;
        startedWorkloads_ = 0;
        invokations_ = 0;
    }


    void ProcessWorkload(std::chrono::milliseconds timeout)
    {
        try
        {
            unique_lock<mutex> lck(mtx_);
            invokations_++;

            if (startedWorkloads_ != 0)
                __debugbreak();

            if (completedWorkloads_ != 0)
                __debugbreak();

            if (awakenings_ != 0)
                __debugbreak();

            if (stillFiring_)
                __debugbreak();

            stillFiring_ = true;
            lck.unlock();
            cvSignalWork_.notify_all();

            lck.lock();
            if (!cvSignalComplete_.wait_for(
                lck,
                timeout,
                //[this] { return this->threadsActive_.load() == 0; })
                [this] { return completedWorkloads_ == poolSize_ && !stillFiring_; })
                )
            {
                if (completedWorkloads_ < poolSize_)
                {
                    if (startedWorkloads_ < poolSize_)
                        MsgBox(L"Thread pool timed out with some threads unstarted");
                    else if (startedWorkloads_ == poolSize_)
                        MsgBox(L"Thread pool timed out with all threads started but not all completed");
                }
                else
                    __debugbreak();
            }


            if (completedWorkloads_ != poolSize_)
                __debugbreak();

            if (awakenings_ != poolSize_)
                __debugbreak();

            awakenings_ = 0;
            completedWorkloads_ = 0;
            startedWorkloads_ = 0;

        }
        catch (std::exception e)
        {
            __debugbreak();
        }

    }



    void launchMethod(IWorkerThreadProcessor* processor, int threadIndex)
    {
        do
        {
            unique_lock<mutex> lck(mtx_);
            cvSignalWork_.wait(
                lck,
                [this] {
                return
                    (stillFiring_ && (startedWorkloads_ < poolSize_)) ||
                    terminate_;
            });
            awakenings_++;

            if (startedWorkloads_ == 0 && terminate_)
                return;

            if (stillFiring_ && startedWorkloads_ < poolSize_) //guard against spurious wakeup
            {
                startedWorkloads_++;
                if (startedWorkloads_ == poolSize_)
                    stillFiring_ = false;

                lck.unlock();
                processor->Process(threadIndex);
                lck.lock();
                completedWorkloads_++;

                if (completedWorkloads_ == poolSize_)
                {
                    lck.unlock();
                    cvSignalComplete_.notify_one();
                }
                else
                    lck.unlock();
            }
            else
                lck.unlock();

        } while (true);
    }



    void TerminateThreads()
    {
        try
        {
            unique_lock<mutex> lck(mtx_);
            if (!terminate_) //Don't attempt to double-terminate
            {
                terminate_ = true;
                lck.unlock();
                cvSignalWork_.notify_all();

                for (int i = 0; i < threads_.size(); i++)
                    threads_[i].join();
            }
        }
        catch (std::exception e)
        {
            __debugbreak();
        }
    }
};

Sahil Dhoked · Accepted Answer · 2019-02-06 21:39:01Z

0

I'm not certain if the following helps solve the problem, but I think the error is as shown below:

This

if (!cvSignalComplete_.wait_for(
            lck,
            timeout,
            [this] { return threadsActive_.load() == 0; })
            )

should be replaced by

if (!cvSignalComplete_.wait_for(
            lck,
            timeout,
            [&] { return threadsActive_.load() == 0; })
            )

Looks like the lambda is not accessing the instantiated member of the class. Here is some reference to back my case. Look at Lambda Capture section of this page.

Edit: Another place you are using wait for with lambdas.

cvSignalWork_.wait(
            lck,
            [this] {
            return
                threadsActive_.load() > 0 ||
                terminateFlag_.load();
        });

Maybe modify all the lambdas and then see if it works?

The reason I'm looking at the lambda is because it seems like a case similar to a spurious wakeup. Hope it helps.

edited Feb 6, 2019 at 21:39

answered Feb 6, 2019 at 4:56

Sahil Dhoked

3732 silver badges12 bronze badges

1 Comment

Samuel Liew Over a year ago

Comments are not for extended discussion; this conversation has been moved to chat.

Collectives™ on Stack Overflow

Condition variable's "wait" function causing unexpected behaviour when predicate is provided

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related