-1

I am starting with designing thread safe data structure and looking to implement a concurrent vector that hides multi-threading complexity for the user and that offers thread safe clear function (which is not offered by Intel TBB::ConcurrentVector) . Considering a concurrent vector which is based on std::vector and locking a mutex for each call of clear or push_back functions of std::vector, how to iterate thread safely through the concurrent vector using for loop? in other terms, how to make iterate_thread in this minimal reproducible example thread safe?

#include <bits/stdc++.h> 
using namespace std; // for the sake of minimal reproducible example
template<class T> 
class ConcurrentVector {
public: 
    ConcurrentVector() : mMutex{} 
                  , mVector{} 
    {}
    auto begin(){
        scoped_lock lk(mMutex);
        return mVector.begin(); 
    }
    auto end(){
        scoped_lock lk(mMutex);
        return mVector.end(); 
    }
    T& push_back(T t) {
        scoped_lock lk(mMutex);
        mVector.push_back(move(t)); 
        return mVector.back();
    }
    void clear() {
        scoped_lock lk(mMutex);
        mVector.clear(); 
    }
private: 
    mutex mMutex; 
    vector<T> mVector; 
}; 

int main(){
    ConcurrentVector<int> vec{};
    atomic_bool terminate_flag = false;
    jthread fill_thread{[&](){
        static int i = 0;
        while(!terminate_flag){
            this_thread::sleep_for(10ms);
            vec.push_back(++i);
        }
    }};
    jthread clear_thread{[&](){
        while(!terminate_flag){
            this_thread::sleep_for(1s);
            vec.clear();
        }
    }};
    jthread iterate_thread{[&](){
        while(!terminate_flag){
            this_thread::sleep_for(400ms);
            for(auto i : vec){
                cout << i << ", ";
            }
            cout << '\n';
        }
    }};
    
    this_thread::sleep_for(3s);
    terminate_flag = true;
    return 0;
}

Calling clear function in iterate_thread will invalidate iterator and actually, I am thinking of these two options:

  • Add getter function in ConcurrentVector to return a copy of the std::vector and iterating through it, but the issue here is that if clear is called in between, I will be iterating through the copy not the real existing data in the vector
  • or add a for_each function in ConcurrentVector that takes a callable, but the issue here is lack of safety as the user may be calling a locking function which causes a deadlock with the vector mutex

Are there other alternatives or could any of the two mentioned improved?

17
  • 2
    Question here should be self-contained. The link to Godbolt is nice as a bonus, but the code (in a form of a minimal reproducible example) should also be copy-pasted as text in the question. Commented Jul 11, 2024 at 10:28
  • 1
    Regarding your actual question: I would opt for the 2nd option (for_each with a callable). You can use a templated method to support all kinds of callables (lambda, std::function etc.). Commented Jul 11, 2024 at 10:31
  • 1
    "locking a mutex for each call of std::vector public member function" To synchronize access to elements this approach is flawed to begin with. For example user can call data() and then modify as they like, that a mutex was locked during that call does not do anything for protecting the elements Commented Jul 11, 2024 at 10:36
  • 1
    imho the most important lesson is that multithreading is not just single threaded plus threads (plus mutexes plus locks) Commented Jul 11, 2024 at 10:37
  • 2
    #include <bits/stdc++.h> + using namespace std; == you just committed C++ suicide; Commented Jul 11, 2024 at 11:12

4 Answers 4

3

Do not export any vector-like functions from your class. Only export a function that returns an object acting as a scoped lock, and make that object export vector-like functions, which in turn won't need any locking. Something like:

ConcurrentVector<int> v; 
...
{
  auto vv = v.open(); // vv locks 
  for (auto& i: vv) {  // and presents vector-like functionality
    i = 2*i+3;
  }
  // vv's destructor unlocks
}
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the proposition, it hides mutex for the user but still he needs to call v.open and the for loop inside a scope and keep it as short as possible, in other terms still the user needs to be aware about multi-threading complexity.
There is no such thing as free lunch. You can hide complexity from the user, but for a rather heavy price
@omarekik take a quick look at what happened in Java when they made a thread-safe vector that hid all the complexity. TL;DR: They wound up with a class that was too slow most of the time and didn't provide the necessary protection most of the time. A really bad combination. The user needs to be involved in the scope of the lock, and this answer gives that with minimum impact..
1

A range-based loop still uses iterators, even if not visible in code. That iterator has to cooperate. You can't just use std::vector::iterator for this, because the lock has to be released when the iterator goes out of scope.

The naïve implementation linked to only locks the mutex in .begin() when the iterator is created, which is way too short.

Locking the mutex while an iterator exists also prevents the problem of two iterators existing, causing write clashes. However, .end() does need to return a sentinel iterator even when there's another outstanding iterator.

Where things really break down is .end()-1. This needs to lock the vector, as the last element of the vector is definitely writeable.

And don't expect this to magically solve all your problems. Code like if (v.begin()!=v.end()) auto last = *(v.end()-1) breaks when another thread calls v.clear() at the wrong moment.

2 Comments

The proposed example contains a data race, as clear function would invalidate iterators. The question is how to avoid the data race within the implementation of ConcurrentVector so it can be safe and hiding multi-threading complexity.
@omarekik: "Hiding complexity" is a dark pattern. The pattern has been invented over and over, and people keep discovering that hidden complexity is more dangerous than obvious complexity.
1

Making a ConcurrentVector that wraps std::vector and locks on all public function calls is probably a bad idea. It will lead to an unnecessary number of lock/unlock calls, and it's generally undesirable to let other threads interleave between your calls. What I would recommend doing is to have one mutex for the entire vector, lock it with an RAII based lock class, perform all your actions and then let the destructor unlock the mutex at the end of the function. This generally achieves good performance and more predictable behavior.

2 Comments

Thanks for your proposition, the intention of ConcurrentVector is to hide multi-threading implementation for the user so he should not need to deal with mutex even though this may introduce some overhead
@omarekik in that case, I would propose to add a lock() method to the ConcurrentVector class, which returns a proxy object which can be used to access the vector, and also holds the mutex locked as long as the proxy is alive.
0

Thanks @MikeVine, this concurrent_vector is based on a monitor that protects std::vector from data race:

template<class t_container>
class monitor{
public:
    template<typename ...Args>
    monitor(Args&&... args) : m_container(std::forward<Args>(args)...){}

    struct monitor_helper{//The time of locking is the life time of monitor_helper
        monitor_helper(monitor* mon) : m_monitor(mon), m_unique_lock(mon->m_mutex) {}
        t_container* operator->(){ return &m_monitor->m_container; }
        t_container& operator*(){ return m_monitor->m_container; } // for accessing using a manually locked object
        monitor* m_monitor;
        std::unique_lock<std::mutex> m_unique_lock;
    };

    monitor_helper operator->(){ return monitor_helper(this); }
    monitor_helper manually_lock(){ return monitor_helper(this); }
private:
    t_container m_container;
    std::mutex  m_mutex;
};

template<typename t_element> 
requires copyable<t_element>   
class concurrent_vector {
public: 
    void push_back(t_element t) {
        m_monitored_vector->push_back(move(t)); 
    }
    void clear() {
        m_monitored_vector->clear(); 
    }
    vector<t_element> get_data(){
        {//scope for manually_lock
            auto vector_lock_handler = m_monitored_vector.manually_lock();
            return *vector_lock_handler;
        }
    }
private: 
    monitor<vector<t_element>> m_monitored_vector; 
}; 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.