compare and swap using atomic_compare_exchange_weak - multithreading

In this code is std::swap thread safe so it can be called from two execution threads at the same time or do I need use atomic_compare_exchange_weak() instead of swap()?
How do I know if this will work on all CPUs? I am happy if it just works on Intel CPUs.
#include <utility>
class resource {
int x = 0;
class foo
foo() : p{new resource{}}
{ }
foo(const foo& other) : p{new resource{*(other.p)}}
{ }
foo(foo&& other) : p{other.p}
other.p = nullptr;
foo& operator=(foo other)
swap(*this, other);
return *this;
virtual ~foo()
delete p;
friend void swap(foo& first, foo& second)
using std::swap;
swap(first.p, second.p);
resource* p;
I understand it is overkill to swap a pointer, but this migth be good pracise.

is std::swap thread safe so it can be called from two execution threads at the same time
std::swap is thread-safe as long as different threads pass different objects into it. Otherwise a race condition arises.


How to interrupt a thread which is waiting for std::condition_variable_any in C++?

I'm reading C++ concurrency in action.
It introduces how to implement interrupting thread using std::condition_variable_any.
I try to understand the code more than a week, but I couldn't.
Below is the code and explanation in the book.
#include <condition_variable>
#include <future>
#include <iostream>
#include <thread>
class thread_interrupted : public std::exception {};
class interrupt_flag {
std::atomic<bool> flag;
std::condition_variable* thread_cond;
std::condition_variable_any* thread_cond_any;
std::mutex set_clear_mutex;
interrupt_flag() : thread_cond(0), thread_cond_any(0) {}
void set() {, std::memory_order_relaxed);
std::lock_guard<std::mutex> lk(set_clear_mutex);
if (thread_cond) {
} else if (thread_cond_any) {
bool is_set() const { return flag.load(std::memory_order_relaxed); }
template <typename Lockable>
void wait(std::condition_variable_any& cv, Lockable& lk);
thread_local static interrupt_flag this_thread_interrupt_flag;
void interruption_point() {
if (this_thread_interrupt_flag.is_set()) {
throw thread_interrupted();
template <typename Lockable>
void interrupt_flag::wait(std::condition_variable_any& cv, Lockable& lk) {
struct custom_lock {
interrupt_flag* self;
// (1) What is this lk for? Why is lk should be already locked when it is used in costume_lock constructor?
Lockable& lk;
custom_lock(interrupt_flag* self_, std::condition_variable_any& cond,
Lockable& lk_)
: self(self_), lk(lk_) {
self->thread_cond_any = &cond;
void unlock() {
void lock() { std::lock(self->set_clear_mutex, lk); }
~custom_lock() {
self->thread_cond_any = 0;
custom_lock cl(this, cv, lk);
class interruptible_thread {
std::thread internal_thread;
interrupt_flag* flag;
template <typename FunctionType>
interruptible_thread(FunctionType f) {
std::promise<interrupt_flag*> p;
internal_thread = std::thread([f, &p] {
flag = p.get_future().get();
void interrupt() {
if (flag) {
void join() { internal_thread.join(); };
void detach();
bool joinable() const;
template <typename Lockable>
void interruptible_wait(std::condition_variable_any& cv, Lockable& lk) {
this_thread_interrupt_flag.wait(cv, lk);
void foo() {
// (2) This is my implementation of how to use interruptible wait. Is it correct?
std::condition_variable_any cv;
std::mutex m;
std::unique_lock<std::mutex> lk(m);
try {
interruptible_wait(cv, lk);
} catch (...) {
std::cout << "interrupted" << std::endl;
int main() {
std::cout << "Hello" << std::endl;
interruptible_thread th(foo);
Your custom lock type acquires the lock on the internal
set_clear_mutex when it’s constructed 1, and then sets the
thread_cond_any pointer to refer to the std:: condition_variable_any
passed in to the constructor 2.
The Lockable reference is stored for later; this must already be
locked. You can now check for an interruption without worrying about
races. If the interrupt flag is set at this point, it was set before
you acquired the lock on set_clear_mutex. When the condition variable
calls your unlock() function inside wait(), you unlock the Lockable
object and the internal set_clear_mutex 3.
This allows threads that are trying to interrupt you to acquire the
lock on set_clear_mutex and check the thread_cond_any pointer once
you’re inside the wait() call but not before. This is exactly what you
were after (but couldn’t manage) with std::condition_variable.
Once wait() has finished waiting (either because it was notified or
because of a spurious wake), it will call your lock() function, which
again acquires the lock on the internal set_clear_mutex and the lock
on the Lockable object 4. You can now check again for interruptions
that happened during the wait() call before clearing the
thread_cond_any pointer in your custom_lock destructor 5, where you
also unlock the set_clear_mutex.
First, I couldn't understand what is the purpose of Lockabel& lk in mark (1) and why it is already locked in constructor of custom_lock. (It could be locked in the very custom_lock constructor. )
Second there is no example in this book of how to use interruptible wait, so foo() {} in mark (2) is my guess implementation of how to use it. Is it correct way of using it ?
You need a mutex-like object (lk in your foo function) to call the interruptiple waiting just as you would need it for the plain std::condition_variable::wait function.
What's problematic (I also read the book and I have doubts about this example) is that the flag member points to a memory location inside the other thread which could finish right before calling flag->set(). In this specific example the thread only exists after we set the flag so that is okay, but otherwise this approach is limited in my opinion (correct me if I am wrong).

How many mutex(es) should be used in one thread

I am working on a c++ (11) project and on the main thread, I need to check the value of two variables. The value of the two variables will be set by other threads through two different callbacks. I am using two condition variables to notify changes of those two variables. Because in c++, locks are needed for condition variables, I am not sure if I should use the same mutex for the two condition variables or I should use two mutex's to minimize exclusive execution. Somehow, I feel one mutex should be sufficient because on one thread(the main thread in this case) the code will be executed sequentially anyway. The code on the main thread that checks (wait for) the value of the two variables wont be interleaved anyway. Let me know if you need me to write code to illustrate the problem. I can prepare that. Thanks.
Update, add code:
#include <mutex>
class SomeEventObserver {
virtual void handleEventA() = 0;
virtual void handleEventB() = 0;
class Client : public SomeEventObserver {
Client() {
m_shouldQuit = false;
m_hasEventAHappened = false;
m_hasEventBHappened = false;
// will be callbed by some other thread (for exampe, thread 10)
virtual void handleEventA() override {
std::lock_guard<std::mutex> lock(m_mutexForA);
m_hasEventAHappened = true;
// will be called by some other thread (for exampe, thread 11)
virtual void handleEventB() override {
std::lock_guard<std::mutex> lock(m_mutexForB);
m_hasEventBHappened = true;
// here waitForA and waitForB are in the main thread, they are executed sequentially
// so I am wondering if I can use just one mutex to simplify the code
void run() {
void doShutDown() {
m_shouldQuit = true;
void waitForA() {
std::unique_lock<std::mutex> lock(m_mutexForA);
m_condVarEventForA.wait(lock, [this]{ return m_hasEventAHappened; });
void waitForB() {
std::unique_lock<std::mutex> lock(m_mutexForB);
m_condVarEventForB.wait(lock, [this]{ return m_hasEventBHappened; });
// I am wondering if I can use just one mutex
std::condition_variable m_condVarEventForA;
std::condition_variable m_condVarEventForB;
std::mutex m_mutexForA;
std::mutex m_mutexForB;
bool m_hasEventAHappened;
bool m_hasEventBHappened;
int main(int argc, char* argv[]) {
Client client;;

multiple-readers, single-writer locks in OpenMP

There is an object shared by multiple threads to read from and write to, and I need to implement the class with a reader-writer lock which has the following functions:
It might be declared occupied by one and no more than one thread. Any other threads that try to occupy it will be rejected, and continue to do their works rather than be blocked.
Any of the threads are allowed to ask whether the object is occupied by self or by others at any time, except for the time when it is being declared occupied or released.
Only the owner of the object is allowed to release its ownership, though others might try to do it as well. If it is not the owner, the releasing operation will be canceled.
The performance needs to be carefully considered.
I'm doing the work with OpenMP, so I hope to implement the lock using only the APIs within OpenMP, rather than POSIX, or so on. I have read this answer, but there are only solutions for implementations of C++ standard library. As mixing OpenMP with C++ standard library or POSIX thread model may slow down the program, I wonder is there a good solution for OpenMP?
I have tried like this, sometimes it worked fine but sometimes it crashed, and sometimes it was dead locked. I find it hard to debug as well.
class Element
typedef int8_t label_t;
Element() : occupied_(-1) {}
// Set it occupied by thread #myThread.
// Return whether it is set successfully.
bool setOccupiedBy(const int myThread)
if (lock_.try_lock())
if (occupied_ == -1)
occupied_ = myThread;
// assert(lock_.get() && ready_.get());
return occupied_ == myThread;
// Return whether it is occupied by other threads
// except for thread #myThread.
bool isOccupiedByOthers(const int myThread) const
bool value = true;
while (lock_.get() != ready_.get());
value = occupied_ != -1 && occupied_ != myThread;
return value;
// Return whether it is occupied by thread #myThread.
bool isOccupiedBySelf(const int myThread) const
bool value = true;
while (lock_.get() != ready_.get());
value = occupied_ == myThread;
return value;
// Clear its occupying mark by thread #myThread.
void clearOccupied(const int myThread)
while (true)
bool ready = ready_.get();
bool lock = lock_.get();
if (!ready && !lock)
if (ready && lock)
label_t occupied = occupied_;
if (occupied == myThread)
occupied_ = -1;
// assert(ready_.get() == lock_.get());
Atomic<label_t> occupied_;
// Locked means it is occupied by one of the threads,
// and one of the threads might be modifying the ownership
MutexLock lock_;
// Ready means it is occupied by one the the threads,
// and none of the threads is modifying the ownership.
Mutex ready_;
The atomic variable, mutex, and the mutex lock is implemented with OpenMP instructions as following:
template <typename T>
class Atomic
Atomic() {}
Atomic(T&& value) : mutex_(value) {}
T set(const T& value)
T oldValue;
#pragma omp atomic capture
oldValue = mutex_;
mutex_ = value;
return oldValue;
T get() const
T value;
#pragma omp read
value = mutex_;
return value;
operator T() const { return get(); }
Atomic& operator=(const T& value)
return *this;
bool operator==(const T& value) { return get() == value; }
bool operator!=(const T& value) { return get() != value; }
volatile T mutex_;
class Mutex : public Atomic<bool>
Mutex() : Atomic<bool>(false) {}
class MutexLock : private Mutex
void lock()
bool oldMutex = false;
while (oldMutex = set(true), oldMutex == true) {}
void unlock() { set(false); }
bool try_lock()
bool oldMutex = set(true);
return oldMutex == false;
using Mutex::operator bool;
using Mutex::get;
I also use the lock provided by OpenMP in alternative:
class OmpLock
OmpLock() { omp_init_lock(&lock_); }
~OmpLock() { omp_destroy_lock(&lock_); }
void lock() { omp_set_lock(&lock_); }
void unlock() { omp_unset_lock(&lock_); }
int try_lock() { return omp_test_lock(&lock_); }
omp_lock_t lock_;
By the way, I use gcc 4.9.4 and OpenMP 4.0, on x86_64 GNU/Linux.

The difference btween std::atomic and std::mutex

how to use std::atomic<>
In the question above, obviously we can just use std::mutex to keep thread safety. I want to know when to use which one.
classs A
std::atomic<int> x;
void Add()
void Sub()
std::mutex mtx;
classs A
int x;
void Add()
std::lock_guard<std::mutex> guard(mtx);
void Sub()
std::lock_guard<std::mutex> guard(mtx);
As a rule of thumb, use std::atomic for POD types where the underlying specialisation will be able to use something clever like a bus lock on the CPU (which will give you no more overhead than a pipeline dump), or even a spin lock. On some systems, an int might already be atomic, so std::atomic<int> will specialise out effectively to an int.
Use std::mutex for non-POD types, bearing in mind that acquiring a mutex is at least an order of magnitude slower than a bus lock.
If you're still unsure, measure the performance.

Looking for an optimum multithread message queue

I want to run several threads inside a process. I'm looking for the most efficient way of being able to pass messages between the threads.
Each thread would have a shared memory input message buffer. Other threads would write the appropriate buffer.
Messages would have priority. I want to manage this process myself.
Without getting into expensive locking or synchronizing, what's the best way to do this? Or is there already a well proven library available for this? (Delphi, C, or C# is fine).
This is hard to get right without repeating a lot of mistakes other people already made for you :)
Take a look at Intel Threading Building Blocks - the library has several well-designed queue templates (and other collections) that you can test and see which suits your purpose best.
If you are going to work with multiple threads, it is hard to avoid synchronisation. Fortunately it is not very hard.
For a single process, a Critical Section is frequently the best choice. It is fast and easy to use. For simplicity, I normally wrap it in a class to handle initialisation and cleanup.
#include <Windows.h>
class CTkCritSec
void Lock()
void Unlock()
You can make it even simpler using an "autolock" class you lock/unlock it.
class CTkAutoLock
CTkAutoLock(CTkCritSec &lock)
: m_lock(lock)
virtual ~CTkAutoLock()
CTkCritSec &m_lock;
Anywhere you want to lock something, instantiate an autolock. When the function finishes, it will unlock. Also, if there is an exception, it will automatically unlock (giving exception safety).
Now you can make a simple message queue out of an std priority queue
#include <queue>
#include <deque>
#include <functional>
#include <string>
struct CMsg
CMsg(const std::string &s, int n=1)
: sText(s), nPriority(n)
int nPriority;
std::string sText;
struct Compare : public std::binary_function<bool, const CMsg *, const CMsg *>
bool operator () (const CMsg *p0, const CMsg *p1)
return p0->nPriority < p1->nPriority;
class CMsgQueue :
private std::priority_queue<CMsg *, std::deque<CMsg *>, CMsg::Compare >
void Push(CMsg *pJob)
CTkAutoLock lk(m_critSec);
CMsg *Pop()
CTkAutoLock lk(m_critSec);
CMsg *pJob(NULL);
if (!Empty())
pJob = top();
return pJob;
bool Empty()
CTkAutoLock lk(m_critSec);
return empty();
CTkCritSec m_critSec;
The content of CMsg can be anything you like. Note that the CMsgQue inherits privately from std::priority_queue. That prevents raw access to the queue without going through our (synchronised) methods.
Assign a queue like this to each thread and you are on your way.
Disclaimer The code here was slapped together quickly to illustrate a point. It probably has errors and needs review and testing before being used in production.
