As an educational exercise I'm implementing a thread pool using condition variables. A controller thread creates a pool of threads that wait on a signal (an atomic variable being set to a value above zero). When signaled the threads wake, perform their work, and when the last thread is done it signals the main thread to awaken. The controller thread blocks until the last thread is complete. The pool is then available for subsequent re-use.
Every now and then I was getting a timeout on the controller thread waiting for the worker to signal completion (likely because of a race condition when decrementing the active work counter), so in an attempt to solidify the pool I replaced the "wait(lck)" form of the condition variable's wait method with "wait(lck, predicate)". Since doing this, the behaviour of the thread pool is such that it seems to permit decrementing of the active work counter below 0 (which is the condition for reawakening the controller thread) - I have a race condition. I've read countless articles on atomic variables, synchronisation, memory ordering, spurious and lost wakeups on stackoverflow and various other sites, have incorporated what I've learnt to the best of my ability, and still cannot for the life of me work out why the way I've coded the predicated wait just does not work. The counter should only ever be as high as the number of threads in the pool (say, 8) and as low as zero. I've started losing faith in myself - it just shouldn't be this hard to do something fundamentally simple. There is clearly something else I need to learn here :)
Considering of course that there was a race condition I ensured that the two variables that drive the awakening and termination of the pool are both atomic, and that both are only ever changed while protected with a unique_lock. Specifically, I made sure that when a request to the pool was launched, the lock was acquired, the active thread counter was changed from 0 to 8, unlocked the mutex, and then "notified_all". The controller thread would only then be awakened with the active thread count at zero, once the last worker thread decremented it that far and "notified_one".
In the worker thread, the condition variable would wait and wake only when the active thread count is greater than zero, unlock the mutex, in parallel proceed to execute the work preassigned to the processor when the pool was created, re-acquire the mutex, and atomically decrement the active thread count. It would then, while still supposedly protected by the lock, test if it was the last thread still active, and if so, again unlock the mutex and "notify_one" to awaken the controller.
The problem is - the active thread counter repeatedly proceeds below zero after even only 1 or 2 iterations. If I test the active thread count at the start of a new workload, I could find the active thread count down around -6 - it is as if the pool was allowed to reawaken the controller thread before the work was completed.
Given that the thread counter and terminate flag are both atomic variables and are only ever modified while under the protection of the same mutex, I am using sequential memory ordering for all updates, I just cannot see how this is happening and I'm lost.
#include <stdafx.h>
#include <Windows.h>
#include <iostream>
#include <thread>
using std::thread;
#include <mutex>
using std::mutex;
using std::unique_lock;
#include <condition_variable>
using std::condition_variable;
#include <atomic>
using std::atomic;
#include <chrono>
#include <vector>
using std::vector;
class IWorkerThreadProcessor
{
public:
virtual void Process(int) = 0;
};
class MyProcessor : public IWorkerThreadProcessor
{
int index_ = 0;
public:
MyProcessor(int index)
{
index_ = index;
}
void Process(int threadindex)
{
for (int i = 0; i < 5000000; i++);
std::cout << '(' << index_ << ':' << threadindex << ") ";
}
};
#define MsgBox(x) do{ MessageBox(NULL, x, L"", MB_OK ); }while(false)
class ThreadPool
{
private:
atomic<unsigned int> invokations_ = 0;
//This goes negative when using the wait_for with predicate
atomic<int> threadsActive_ = 0;
atomic<bool> terminateFlag_ = false;
vector<std::thread> threads_;
atomic<unsigned int> poolSize_ = 0;
mutex mtxWorker_;
condition_variable cvSignalWork_;
condition_variable cvSignalComplete_;
public:
~ThreadPool()
{
TerminateThreads();
}
void Init(std::vector<IWorkerThreadProcessor*>& processors)
{
unique_lock<mutex> lck2(mtxWorker_);
threadsActive_ = 0;
terminateFlag_ = false;
poolSize_ = processors.size();
for (int i = 0; i < poolSize_; ++i)
threads_.push_back(thread(&ThreadPool::launchMethod, this, processors[i], i));
}
void ProcessWorkload(std::chrono::milliseconds timeout)
{
//Only used to see how many invocations I was getting through before experiencing the issue - sadly it's only one or two
invocations_++;
try
{
unique_lock<mutex> lck(mtxWorker_);
//!!!!!! If I use the predicated wait this break will fire !!!!!!
if (threadsActive_.load() != 0)
__debugbreak();
threadsActive_.store(poolSize_);
lck.unlock();
cvSignalWork_.notify_all();
lck.lock();
if (!cvSignalComplete_.wait_for(
lck,
timeout,
[this] { return threadsActive_.load() == 0; })
)
{
//As you can tell this has taken me through a journey trying to characterise the issue...
if (threadsActive_ > 0)
MsgBox(L"Thread pool timed out with still active threads");
else if (threadsActive_ == 0)
MsgBox(L"Thread pool timed out with zero active threads");
else
MsgBox(L"Thread pool timed out with negative active threads");
}
}
catch (std::exception e)
{
__debugbreak();
}
}
void launchMethod(IWorkerThreadProcessor* processor, int threadIndex)
{
do
{
unique_lock<mutex> lck(mtxWorker_);
//!!!!!! If I use this predicated wait I see the failure !!!!!!
cvSignalWork_.wait(
lck,
[this] {
return
threadsActive_.load() > 0 ||
terminateFlag_.load();
});
//!!!!!!!! Does not cause the failure but obviously will not handle
//spurious wake-ups !!!!!!!!!!
//cvSignalWork_.wait(lck);
if (terminateFlag_.load())
return;
//Unlock to parallelise the work load
lck.unlock();
processor->Process(threadIndex);
//Re-lock to decrement the work count
lck.lock();
//This returns the value before the subtraction so theoretically if the previous value was 1 then we're the last thread going and we can now signal the controller thread to wake. This is the only place that the decrement happens so I don't know how it could possibly go negative
if (threadsActive_.fetch_sub(1, std::memory_order_seq_cst) == 1)
{
lck.unlock();
cvSignalComplete_.notify_one();
}
else
lck.unlock();
} while (true);
}
void TerminateThreads()
{
try
{
unique_lock<mutex> lck(mtxWorker_);
if (!terminateFlag_)
{
terminateFlag_ = true;
lck.unlock();
cvSignalWork_.notify_all();
for (int i = 0; i < threads_.size(); i++)
threads_[i].join();
}
}
catch (std::exception e)
{
__debugbreak();
}
}
};
int main()
{
std::vector<IWorkerThreadProcessor*> processors;
for (int i = 0; i < 8; i++)
processors.push_back(new MyProcessor(i));
std::cout << "Instantiating thread pool\n";
auto pool = new ThreadPool;
std::cout << "Initialisting thread pool\n";
pool->Init(processors);
std::cout << "Thread pool initialised\n";
for (int i = 0; i < 200; i++)
{
std::cout << "Workload " << i << "\n";
pool->ProcessWorkload(std::chrono::milliseconds(500));
std::cout << "Workload " << i << " complete." << "\n";
}
for (auto a : processors)
delete a;
delete pool;
return 0;
}
std::cout << '(' << index_ << ':' << threadindex << ") ";