Implementing boost::barrier in C++11 - multithreading

I've been trying to get a project rid of every boost reference and switch to pure C++11.
At one point, thread workers are created which wait for a barrier to give the 'go' command, do the work (spread through the N threads) and synchronize when all of them finish. The basic idea is that the main loop gives the go order (boost::barrier .wait()) and waits for the result with the same function.
I had implemented in a different project a custom made Barrier based on the Boost version and everything worked perfectly. Implementation is as follows:
Barrier.h:
class Barrier {
public:
Barrier(unsigned int n);
void Wait(void);
private:
std::mutex counterMutex;
std::mutex waitMutex;
unsigned int expectedN;
unsigned int currentN;
};
Barrier.cpp
Barrier::Barrier(unsigned int n) {
expectedN = n;
currentN = expectedN;
}
void Barrier::Wait(void) {
counterMutex.lock();
// If we're the first thread, we want an extra lock at our disposal
if (currentN == expectedN) {
waitMutex.lock();
}
// Decrease thread counter
--currentN;
if (currentN == 0) {
currentN = expectedN;
waitMutex.unlock();
currentN = expectedN;
counterMutex.unlock();
} else {
counterMutex.unlock();
waitMutex.lock();
waitMutex.unlock();
}
}
This code has been used on iOS and Android's NDK without any problems, but when trying it on a Visual Studio 2013 project it seems only a thread which locked a mutex can unlock it (assertion: unlock of unowned mutex).
Is there any non-spinning (blocking, such as this one) version of barrier that I can use that works for C++11? I've only been able to find barriers which used busy-waiting which is something I would like to prevent (unless there is really no reason for it).

class Barrier {
public:
explicit Barrier(std::size_t iCount) :
mThreshold(iCount),
mCount(iCount),
mGeneration(0) {
}
void Wait() {
std::unique_lock<std::mutex> lLock{mMutex};
auto lGen = mGeneration;
if (!--mCount) {
mGeneration++;
mCount = mThreshold;
mCond.notify_all();
} else {
mCond.wait(lLock, [this, lGen] { return lGen != mGeneration; });
}
}
private:
std::mutex mMutex;
std::condition_variable mCond;
std::size_t mThreshold;
std::size_t mCount;
std::size_t mGeneration;
};

Use a std::condition_variable instead of a std::mutex to block all threads until the last one reaches the barrier.
class Barrier
{
private:
std::mutex _mutex;
std::condition_variable _cv;
std::size_t _count;
public:
explicit Barrier(std::size_t count) : _count(count) { }
void Wait()
{
std::unique_lock<std::mutex> lock(_mutex);
if (--_count == 0) {
_cv.notify_all();
} else {
_cv.wait(lock, [this] { return _count == 0; });
}
}
};

Here's my version of the accepted answer above with Auto reset behavior for repetitive use; this was achieved by counting up and down alternately.
/**
* #brief Represents a CPU thread barrier
* #note The barrier automatically resets after all threads are synced
*/
class Barrier
{
private:
std::mutex m_mutex;
std::condition_variable m_cv;
size_t m_count;
const size_t m_initial;
enum State : unsigned char {
Up, Down
};
State m_state;
public:
explicit Barrier(std::size_t count) : m_count{ count }, m_initial{ count }, m_state{ State::Down } { }
/// Blocks until all N threads reach here
void Sync()
{
std::unique_lock<std::mutex> lock{ m_mutex };
if (m_state == State::Down)
{
// Counting down the number of syncing threads
if (--m_count == 0) {
m_state = State::Up;
m_cv.notify_all();
}
else {
m_cv.wait(lock, [this] { return m_state == State::Up; });
}
}
else // (m_state == State::Up)
{
// Counting back up for Auto reset
if (++m_count == m_initial) {
m_state = State::Down;
m_cv.notify_all();
}
else {
m_cv.wait(lock, [this] { return m_state == State::Down; });
}
}
}
};

Seem all above answers don't work in the case the barrier is placed too near
Example: Each thread run the while loop look like this:
while (true)
{
threadBarrier->Synch();
// do heavy computation
threadBarrier->Synch();
// small external calculations like timing, loop count, etc, ...
}
And here is the solution using STL:
class ThreadBarrier
{
public:
int m_threadCount = 0;
int m_currentThreadCount = 0;
std::mutex m_mutex;
std::condition_variable m_cv;
public:
inline ThreadBarrier(int threadCount)
{
m_threadCount = threadCount;
};
public:
inline void Synch()
{
bool wait = false;
m_mutex.lock();
m_currentThreadCount = (m_currentThreadCount + 1) % m_threadCount;
wait = (m_currentThreadCount != 0);
m_mutex.unlock();
if (wait)
{
std::unique_lock<std::mutex> lk(m_mutex);
m_cv.wait(lk);
}
else
{
m_cv.notify_all();
}
};
};
And the solution for Windows:
class ThreadBarrier
{
public:
SYNCHRONIZATION_BARRIER m_barrier;
public:
inline ThreadBarrier(int threadCount)
{
InitializeSynchronizationBarrier(
&m_barrier,
threadCount,
8000);
};
public:
inline void Synch()
{
EnterSynchronizationBarrier(
&m_barrier,
0);
};
};

Related

Wait for thread queue to be empty

I am new to C++ and multithreading applications. I want to process a long list of data (potentially several thousands of entries) by dividing its entries among a few threads. I have retrieved a ThreadPool class and a Queue class from the web (it is my first time tackling the subject). I construct the threads and populate the queue in the following way (definitions at the end of the post):
ThreadPool *pool = new ThreadPool(8);
std::vector<std::function<void(int)>> *caller =
new std::vector<std::function<void(int)>>;
for (size_t i = 0; i < Nentries; ++i)
{
caller->push_back(
[=](int j){func(entries[i], j);});
pool->PushTask((*caller)[i]);
}
delete pool;
The problem is that only a number of entries equaling the number of created threads are processed, as if the program does not wait for the queue to be empty. Indeed, if I put
while (pool->GetWorkQueueLength()) {}
just before the pool destructor, the whole list is correctly processed. However, I am afraid I am consuming too many resources by using a while loop. Moreover, I have not found anyone doing anything like it, so I think this is the wrong approach and the classes I use have some error. Can anyone find the error (if present) or suggest another solution?
Here are the classes I use. I suppose the problem is in the implementation of the destructor, but I am not sure.
SynchronizeQueue.hh
#ifndef SYNCQUEUE_H
#define SYNCQUEUE_H
#include <list>
#include <mutex>
#include <condition_variable>
template<typename T>
class SynchronizedQueue
{
public:
SynchronizedQueue();
void Put(T const & data);
T Get();
size_t Size();
private:
SynchronizedQueue(SynchronizedQueue const &)=delete;
SynchronizedQueue & operator=(SynchronizedQueue const &)=delete;
std::list<T> queue;
std::mutex mut;
std::condition_variable condvar;
};
template<typename T>
SynchronizedQueue<T>::SynchronizedQueue()
{}
template<typename T>
void SynchronizedQueue<T>::Put(T const & data)
{
std::unique_lock<std::mutex> lck(mut);
queue.push_back(data);
condvar.notify_one();
}
template<typename T>
T SynchronizedQueue<T>::Get()
{
std::unique_lock<std::mutex> lck(mut);
while (queue.empty())
{
condvar.wait(lck);
}
T result = queue.front();
queue.pop_front();
return result;
}
template<typename T>
size_t SynchronizedQueue<T>::Size()
{
std::unique_lock<std::mutex> lck(mut);
return queue.size();
}
#endif
ThreadPool.hh
#ifndef THREADPOOL_H
#define THREADPOOL_H
#include "SynchronizedQueue.hh"
#include <atomic>
#include <functional>
#include <mutex>
#include <thread>
#include <vector>
class ThreadPool
{
public:
ThreadPool(int nThreads = 0);
virtual ~ThreadPool();
void PushTask(std::function<void(int)> func);
size_t GetWorkQueueLength();
private:
void WorkerThread(int i);
std::atomic<bool> done;
unsigned int threadCount;
SynchronizedQueue<std::function<void(int)>> workQueue;
std::vector<std::thread> threads;
};
#endif
ThreadPool.cc
#include "ThreadPool.hh"
#include "SynchronizedQueue.hh"
void doNothing(int i)
{}
ThreadPool::ThreadPool(int nThreads)
: done(false)
{
if (nThreads <= 0)
{
threadCount = std::thread::hardware_concurrency();
}
else
{
threadCount = nThreads;
}
for (unsigned int i = 0; i < threadCount; ++i)
{
threads.push_back(std::thread(&ThreadPool::WorkerThread, this, i));
}
}
ThreadPool::~ThreadPool()
{
done = true;
for (unsigned int i = 0; i < threadCount; ++i)
{
PushTask(&doNothing);
}
for (auto& th : threads)
{
if (th.joinable())
{
th.join();
}
}
}
void ThreadPool::PushTask(std::function<void(int)> func)
{
workQueue.Put(func);
}
void ThreadPool::WorkerThread(int i)
{
while (!done)
{
workQueue.Get()(i);
}
}
size_t ThreadPool::GetWorkQueueLength()
{
return workQueue.Size();
}
You can push tasks saying "done" instead of setting "done" via atomic variable.
So that each thread will exit by itself when seeing "done" task, and no earlier. In destructor you only need to push these tasks and join threads. This is called "poison pill".
Alternatively, if you insist on your current design with done variable, you can wait on the same condition you already have:
std::unique_lock<std::mutex> lck(mut);
while (!queue.empty())
{
condvar.wait(lck);
}
But then you'll need to change your notify_one to notify_all, and this may be sub-optimal.
I want to process a long list of data (potentially several thousands of entries) by dividing its entries among a few threads.
You can do that with parallel algorithms, like tbb::parallel_for:
#include <tbb/parallel_for.h>
#include <vector>
void func(int entry);
int main () {
std::vector<int> entries(1000000);
tbb::parallel_for(size_t{0}, entries.size(), [&](size_t i) { func(entries[i]); });
}
If you need sequential thread ids, you can do:
void func(int element, int thread_id);
template<class C>
inline auto make_range(C& c) -> decltype(tbb::blocked_range<decltype(c.begin())>(c.begin(), c.end())) {
return tbb::blocked_range<decltype(c.begin())>(c.begin(), c.end());
}
int main () {
std::vector<int> entries(1000000);
std::atomic<int> thread_counter{0};
tbb::parallel_for(make_range(entries), [&](auto sub_range) {
static thread_local int const thread_id = thread_counter.fetch_add(1, std::memory_order_relaxed);
for(auto& element : sub_range)
func(element, thread_id);
});
}
Alternatively, there is std::this_thread::get_id.

How many mutex(es) should be used in one thread

I am working on a c++ (11) project and on the main thread, I need to check the value of two variables. The value of the two variables will be set by other threads through two different callbacks. I am using two condition variables to notify changes of those two variables. Because in c++, locks are needed for condition variables, I am not sure if I should use the same mutex for the two condition variables or I should use two mutex's to minimize exclusive execution. Somehow, I feel one mutex should be sufficient because on one thread(the main thread in this case) the code will be executed sequentially anyway. The code on the main thread that checks (wait for) the value of the two variables wont be interleaved anyway. Let me know if you need me to write code to illustrate the problem. I can prepare that. Thanks.
Update, add code:
#include <mutex>
class SomeEventObserver {
public:
virtual void handleEventA() = 0;
virtual void handleEventB() = 0;
};
class Client : public SomeEventObserver {
public:
Client() {
m_shouldQuit = false;
m_hasEventAHappened = false;
m_hasEventBHappened = false;
}
// will be callbed by some other thread (for exampe, thread 10)
virtual void handleEventA() override {
{
std::lock_guard<std::mutex> lock(m_mutexForA);
m_hasEventAHappened = true;
}
m_condVarEventForA.notify_all();
}
// will be called by some other thread (for exampe, thread 11)
virtual void handleEventB() override {
{
std::lock_guard<std::mutex> lock(m_mutexForB);
m_hasEventBHappened = true;
}
m_condVarEventForB.notify_all();
}
// here waitForA and waitForB are in the main thread, they are executed sequentially
// so I am wondering if I can use just one mutex to simplify the code
void run() {
waitForA();
waitForB();
}
void doShutDown() {
m_shouldQuit = true;
}
private:
void waitForA() {
std::unique_lock<std::mutex> lock(m_mutexForA);
m_condVarEventForA.wait(lock, [this]{ return m_hasEventAHappened; });
}
void waitForB() {
std::unique_lock<std::mutex> lock(m_mutexForB);
m_condVarEventForB.wait(lock, [this]{ return m_hasEventBHappened; });
}
// I am wondering if I can use just one mutex
std::condition_variable m_condVarEventForA;
std::condition_variable m_condVarEventForB;
std::mutex m_mutexForA;
std::mutex m_mutexForB;
bool m_hasEventAHappened;
bool m_hasEventBHappened;
};
int main(int argc, char* argv[]) {
Client client;
client.run();
}

How to prematurely kill std::async threads before they are finished *without* using a std::atomic_bool?

I have a function that takes a callback, and used it to do work on 10 separate threads. However, it is often the case that not all of the work is needed. For example, if the desired result is obtained on the third thread, it should stop all work being done on of the remaining alive threads.
This answer here suggests that it is not possible unless you have the callback functions take an additional std::atomic_bool argument, that signals whether the function should terminate prematurely.
This solution does not work for me. The workers are spun up inside a base class, and the whole point of this base class is to abstract away details of multithreading. How can I do this? I am anticipating that I will have to ditch std::async for something more involved.
#include <iostream>
#include <future>
#include <vector>
class ABC{
public:
std::vector<std::future<int> > m_results;
ABC() {};
~ABC(){};
virtual int callback(int a) = 0;
void doStuffWithCallBack();
};
void ABC::doStuffWithCallBack(){
// start working
for(int i = 0; i < 10; ++i)
m_results.push_back(std::async(&ABC::callback, this, i));
// analyze results and cancel all threads when you get the 1
for(int j = 0; j < 10; ++j){
double foo = m_results[j].get();
if ( foo == 1){
break; // but threads continue running
}
}
std::cout << m_results[9].get() << " <- this shouldn't have ever been computed\n";
}
class Derived : public ABC {
public:
Derived() : ABC() {};
~Derived() {};
int callback(int a){
std::cout << a << "!\n";
if (a == 3)
return 1;
else
return 0;
};
};
int main(int argc, char **argv)
{
Derived myObj;
myObj.doStuffWithCallBack();
return 0;
}
I'll just say that this should probably not be a part of a 'normal' program, since it could leak resources and/or leave your program in an unstable state, but in the interest of science...
If you have control of the thread loop, and you don't mind using platform features, you could inject an exception into the thread. With posix you can use signals for this, on Windows you would have to use SetThreadContext(). Though the exception will generally unwind the stack and call destructors, your thread may be in a system call or other 'non-exception safe place' when the exception occurs.
Disclaimer: I only have Linux at the moment, so I did not test the Windows code.
#if defined(_WIN32)
# define ITS_WINDOWS
#else
# define ITS_POSIX
#endif
#if defined(ITS_POSIX)
#include <signal.h>
#endif
void throw_exception() throw(std::string())
{
throw std::string();
}
void init_exceptions()
{
volatile int i = 0;
if (i)
throw_exception();
}
bool abort_thread(std::thread &t)
{
#if defined(ITS_WINDOWS)
bool bSuccess = false;
HANDLE h = t.native_handle();
if (INVALID_HANDLE_VALUE == h)
return false;
if (INFINITE == SuspendThread(h))
return false;
CONTEXT ctx;
ctx.ContextFlags = CONTEXT_CONTROL;
if (GetThreadContext(h, &ctx))
{
#if defined( _WIN64 )
ctx.Rip = (DWORD)(DWORD_PTR)throw_exception;
#else
ctx.Eip = (DWORD)(DWORD_PTR)throw_exception;
#endif
bSuccess = SetThreadContext(h, &ctx) ? true : false;
}
ResumeThread(h);
return bSuccess;
#elif defined(ITS_POSIX)
pthread_kill(t.native_handle(), SIGUSR2);
#endif
return false;
}
#if defined(ITS_POSIX)
void worker_thread_sig(int sig)
{
if(SIGUSR2 == sig)
throw std::string();
}
#endif
void init_threads()
{
#if defined(ITS_POSIX)
struct sigaction sa;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sa.sa_handler = worker_thread_sig;
sigaction(SIGUSR2, &sa, 0);
#endif
}
class tracker
{
public:
tracker() { printf("tracker()\n"); }
~tracker() { printf("~tracker()\n"); }
};
int main(int argc, char *argv[])
{
init_threads();
printf("main: starting thread...\n");
std::thread t([]()
{
try
{
tracker a;
init_exceptions();
printf("thread: started...\n");
std::this_thread::sleep_for(std::chrono::minutes(1000));
printf("thread: stopping...\n");
}
catch(std::string s)
{
printf("thread: exception caught...\n");
}
});
printf("main: sleeping...\n");
std::this_thread::sleep_for(std::chrono::seconds(2));
printf("main: aborting...\n");
abort_thread(t);
printf("main: joining...\n");
t.join();
printf("main: exiting...\n");
return 0;
}
Output:
main: starting thread...
main: sleeping...
tracker()
thread: started...
main: aborting...
main: joining...
~tracker()
thread: exception caught...
main: exiting...

multiple-readers, single-writer locks in OpenMP

There is an object shared by multiple threads to read from and write to, and I need to implement the class with a reader-writer lock which has the following functions:
It might be declared occupied by one and no more than one thread. Any other threads that try to occupy it will be rejected, and continue to do their works rather than be blocked.
Any of the threads are allowed to ask whether the object is occupied by self or by others at any time, except for the time when it is being declared occupied or released.
Only the owner of the object is allowed to release its ownership, though others might try to do it as well. If it is not the owner, the releasing operation will be canceled.
The performance needs to be carefully considered.
I'm doing the work with OpenMP, so I hope to implement the lock using only the APIs within OpenMP, rather than POSIX, or so on. I have read this answer, but there are only solutions for implementations of C++ standard library. As mixing OpenMP with C++ standard library or POSIX thread model may slow down the program, I wonder is there a good solution for OpenMP?
I have tried like this, sometimes it worked fine but sometimes it crashed, and sometimes it was dead locked. I find it hard to debug as well.
class Element
{
public:
typedef int8_t label_t;
Element() : occupied_(-1) {}
// Set it occupied by thread #myThread.
// Return whether it is set successfully.
bool setOccupiedBy(const int myThread)
{
if (lock_.try_lock())
{
if (occupied_ == -1)
{
occupied_ = myThread;
ready_.set(true);
}
}
// assert(lock_.get() && ready_.get());
return occupied_ == myThread;
}
// Return whether it is occupied by other threads
// except for thread #myThread.
bool isOccupiedByOthers(const int myThread) const
{
bool value = true;
while (lock_.get() != ready_.get());
value = occupied_ != -1 && occupied_ != myThread;
return value;
}
// Return whether it is occupied by thread #myThread.
bool isOccupiedBySelf(const int myThread) const
{
bool value = true;
while (lock_.get() != ready_.get());
value = occupied_ == myThread;
return value;
}
// Clear its occupying mark by thread #myThread.
void clearOccupied(const int myThread)
{
while (true)
{
bool ready = ready_.get();
bool lock = lock_.get();
if (!ready && !lock)
return;
if (ready && lock)
break;
}
label_t occupied = occupied_;
if (occupied == myThread)
{
ready_.set(false);
occupied_ = -1;
lock_.unlock();
}
// assert(ready_.get() == lock_.get());
}
protected:
Atomic<label_t> occupied_;
// Locked means it is occupied by one of the threads,
// and one of the threads might be modifying the ownership
MutexLock lock_;
// Ready means it is occupied by one the the threads,
// and none of the threads is modifying the ownership.
Mutex ready_;
};
The atomic variable, mutex, and the mutex lock is implemented with OpenMP instructions as following:
template <typename T>
class Atomic
{
public:
Atomic() {}
Atomic(T&& value) : mutex_(value) {}
T set(const T& value)
{
T oldValue;
#pragma omp atomic capture
{
oldValue = mutex_;
mutex_ = value;
}
return oldValue;
}
T get() const
{
T value;
#pragma omp read
value = mutex_;
return value;
}
operator T() const { return get(); }
Atomic& operator=(const T& value)
{
set(value);
return *this;
}
bool operator==(const T& value) { return get() == value; }
bool operator!=(const T& value) { return get() != value; }
protected:
volatile T mutex_;
};
class Mutex : public Atomic<bool>
{
public:
Mutex() : Atomic<bool>(false) {}
};
class MutexLock : private Mutex
{
public:
void lock()
{
bool oldMutex = false;
while (oldMutex = set(true), oldMutex == true) {}
}
void unlock() { set(false); }
bool try_lock()
{
bool oldMutex = set(true);
return oldMutex == false;
}
using Mutex::operator bool;
using Mutex::get;
};
I also use the lock provided by OpenMP in alternative:
class OmpLock
{
public:
OmpLock() { omp_init_lock(&lock_); }
~OmpLock() { omp_destroy_lock(&lock_); }
void lock() { omp_set_lock(&lock_); }
void unlock() { omp_unset_lock(&lock_); }
int try_lock() { return omp_test_lock(&lock_); }
private:
omp_lock_t lock_;
};
By the way, I use gcc 4.9.4 and OpenMP 4.0, on x86_64 GNU/Linux.

pthread mutexes, thread synchronization using pthread_mutex_trylock()

I've been working on getting a key value hash dictionary, and one of the things I need to do is rehash when the load factor reaches a threshold, so to make sure no other threads are accessing it I'm trying to bring it down to a single queue.
Will this successfully prevent more than one thread from being able to read/write to the resource at the same time, or is there something I'm missing.
Thanks!!
typedef struct {
pthread_mutex_t mutex;
pthread_cond_t cond;
pthread_rwlock_t rw_lock;
} thread_conch;
typedef struct {
thread_conch d_lock;
size_t elements;
size_t hash_size;
struct kv_record **bucket;
} dictionary;
void init_conch(thread_conch *t_lock)
{
pthread_mutex_init(&t_lock->mutex, NULL);
pthread_cond_init(&t_lock->cond, NULL);
pthread_rwlock_init(&t_lock->rw_lock, NULL);
}
void destroy_conch(thread_conch *t_lock)
{
pthread_mutex_destroy(&t_lock->mutex);
pthread_cond_destroy(&t_lock->cond);
pthread_rwlock_destroy(&t_lock->rw_lock);
}
void hold_conch(thread_conch *t_lock)
{
pthread_rwlock_wrlock(&t_lock->rw_lock);
while (pthread_mutex_trylock(&t_lock->mutex) != 0)
;
}
void read_conch(thread_conch *t_lock)
{
while (pthread_rwlock_tryrdlock(&t_lock->rw_lock) != 0)
pthread_cond_wait(&t_lock->cond, &t_lock->mutex);
}
void release_conch(thread_conch *t_lock)
{
pthread_rwlock_unlock(&t_lock->rw_lock);
pthread_mutex_unlock(&t_lock->mutex);
pthread_cond_broadcast(&t_lock->cond);
}

Resources