c++11 thread sleep/wakeup without lock? - multithreading

I use a lockfree queue between two threads. One produce data,another consume data. What I want to do is that, when the queue is empty, consumer thread yield cpu until producer thread push data to the queue. I can't call sleep() since there is no way to wakeup a sleeping thread, I think. What I found is std::condition_variable, but it needs a mutex. producer thread need to hold the lock and then notify consumer thread for every data pushing. Is there a better and lighter way to realize my goal?

Related

Why does std::condition_variable wait() require a std::unique_lock arg?

My thread does not need to be locked. std::unique_lock locks thread on construction. I am simply using cond_var.wait() as a way to avoid busy waiting. I have essentially circumvented the auto-locking by putting the unique_lock within a tiny scope and hence destroying the unique lock after it leaves the tiny scope. Additionally, there is only a single consumer thread if that's relevant.
{
std::unique_lock<std::mutex> dispatch_ul(dispatch_mtx);
pq_cond.wait(dispatch_ul);
}
Is there possibly a better option to avoid the unnecessary auto-lock functionality from the unique_lock? I'm looking for a mutexless option to simply signal the thread, I am aware of std::condition_variable_any but that requires a mutex of sorts which is yet again unnessesary in my case.
You need a lock to prevent this common newbie mistake:
Producer thread produces something,
Producer thread calls some_condition.notify_all(),
Producer thread goes idle for a while,
meanwhile:
Consumer thread calls some_condition.wait(...)
Consumer thread waits,...
And waits,...
And waits.
A condition variable is not a flag. It does not remember that it was notified. If the producer calls notify_one() or notify_all() before the consumer has entered the wait() call, then the notification is "lost."
In order to prevent lost notifications, there must be some shared data that tells the consumer whether or not it needs to wait, and there must be a lock to protect the shared data.
The producer should:
Lock the lock,
update the shared data,
notify the condition variable,
release the lock
The consumer must then:
Lock the lock,
Check the shared data to see if it needs wait,
Wait if needed,
consume whatever,
release the lock.
The consumer needs to pass the lock in to the wait(...) call so that wait(...) can temporarily unlock it, and then re-lock it before returning. If wait(...) did not unlock the lock, then the producer would never be able to reach the notify() call.

C++11 non-blocking producer/consumer

I have a C++11 application with a high-priority thread that's producing data, and a low-priority thread that's consuming it (in my case, writing it to disk). I'd like to make sure the high-priority producer thread is never blocked, i.e. it uses only lock-free algorithms.
With a lock-free queue, I can push data to the queue from the producer thread, and poll it from the consumer thread, thus meeting my goals above. I'd like to modify my program so that the consumer thread blocks when inactive instead of polling.
It seems like the C++11 condition variable might be useful to block the consumer thread. Can anyone show me an example of how to use it, while avoiding the possibility that the consumer sleeps with data still in the queue? More specifically, I want to make sure that the consumer is always woken up some finite time after the producer pushes the last item into the queue. It's also important that the producer remains non-blocking.
It seems like the C++11 condition variable might be useful to block the consumer thread. Can anyone show me an example of how to use it, while avoiding the possibility that the consumer sleeps with data still in the queue?
To use a condition variable you need a mutex and a condition. In your case the condition will be "there is data available in the queue". Since the producer will be using lock-free updates to produce work, the consumer has to use the same form of synchronisation to consume the work, so the mutex will not actually be used for synchronisation and is only needed by the consumer thread because there's no other way to wait on a condition variable.
// these variables are members or otherwise shared between threads
std::mutex m_mutex;
std::condition_variable m_cv;
lockfree_queue m_data;
// ...
// in producer thread:
while (true)
{
// add work to queue
m_data.push(x);
m_cv.notify_one();
}
// in consumer thread:
while (true)
{
std::unique_lock<std::mutex> lock(m_mutex);
m_cv.wait(lock, []{ return !m_data.empty(); });
// remove data from queue and process it
auto x = m_data.pop();
}
The condition variable will only block in the wait call if the queue is empty before the wait. The condition variable might wake up spuriously, or because it was notified by the producer, but in either case will only return from the wait call (rather than sleeping again) if the queue is non-empty. That's guaranteed by using the condition_variable::wait overload that takes a predicate, because the condition variable always re-checks the predicate for you.
Since the mutex is only used by the consumer thread it could in fact be local to that thread (as long as you only have one consumer, with more than one they all need to share the same mutex to wait on the same condvar).
One solution I found to this in the past was using Windows events (http://msdn.microsoft.com/en-us/library/windows/desktop/ms682396(v=vs.85).aspx). In this case the event remains signaled until it wakes up a waiting thread, and if no threads are waiting it remains signaled. So the producer simply needs to signal the event after pushing data to the queue. Then we are guaranteed that the consumer will wake up some finite time after this.
I wasn't able to find a way to implement this using the standard library though (at least not without blocking the producer thread).
I think semaphores could be used to solve this problem safely:
// in producer thread:
while (true)
{
m_data.push();
m_semaphore.release();
}
// in consumer thread:
while (true)
{
m_semaphore.wait();
m_data.pop();
}
Unfortunately I don't think C++11 includes a semaphore? I have also not been able to confirm that releasing a semaphore is a non-blocking operation. Certainly implementations with mutexes e.g. C++0x has no semaphores? How to synchronize threads? will not allow for a non-blocking producer thread.

Is it ok to use a semphore as a global pause for worker threads?

I'm thinking of using a semaphore as a pause mechanism for a pool of worker threads like so:
// main thread
for N jobs:
semaphore.release()
create and start worker
// worker thread
while (not done)
semaphore.acquire()
do_work
semaphore.release()
Now, if I want to pause all workers, I can acquire the entire count available in the semaphore. I'm wondering it that is better than:
if (paused)
paused_mutex.lock
wait for condition (paused_mutex)
do_work
Or is there a better solution?
I guess one downside of doing it with the semaphore is that the main thread will block until all workers release. In my case, the unit of work per iteration is very small so that probably won't be a problem.
Update: to clarify, my workers are database backups that act like file copies. The while(not quit) loop quits when the file has been successfully copied. So to relate it to the traditional worker-waits-for-condition to get work: my workers wait for a needed file copy and the while loop you see is doing the work requested. You could think of my do_work above as do_piece_of_work.
The problem with the semaphore approach is that the worker threads have to constantly check for work. They are eating up all the available CPU resources. It is better to use a mutex and a condition (signalling) variable (as in your second example) so that the threads are woken up only when they have something to do.
It is also better to hold the mutex for as short a time as possible. The traditional way to do this is to create a WORK QUEUE and to use the mutex to synchronize queue inserts and removals. The main thread inserts into the work queue and wakes up the worker. The worker acquires the mutex, removes an item from the queue, then release the mutex. NOW the worker performs the action. This maximizes the concurrency between the worker threads and the main thread.
Here is an example:
// main thread
create signal variable
create mutex
for N jobs:
create and start worker
while (wait for work)
// we have something to do
create work item
mutex.acquire();
insert_work_into_queue(item);
mutex.release();
//tell the workers
signal_condition_variable()
//worker thread
while (wait for condition)
mutex.acquire();
work=remove_item_from_queue();
mutex.release();
if (work) do(work);
This is a simple example where all the worker threads are awakened, even though only one worker will actually succeed in getting work off of the queue. If you want even more efficiency, use an array of condition variables, one per worker thread and then just signal the "next" one, using an algorithm for "next" that is as simple or as complex as you want.

Why do I get a thread context switch every time I synchronize with a mutex?

I have multiple threads updating a single array in tight loops. (10 threads on a dual-core processor # roughly 100000 updates per second). Each time the array is updated under the protection of a mutex (WaitForSingleObject / ReleaseMutex). I have noticed that no thread ever does two consecutive updates to the array which means there must be some sort of yield relating to the synchronization. This means there are about 100000 context switches happening every second which seems sub-optimal. Why does this happen ?
The problem here is that there is an order of all waiting threads.
Each thread blocked in a WaitForSingleObject goes into a queue and is then suspended by the scheduler so that it does not eat up execution time anymore. When the mutex is freed, one of the waiting threads is resumed by the scheduler. It is unspecified what the exact order is in which threads are wakened from the queue, but in many cases it will be a simple first-in, first-out.
What happens now is that if the same thread releases the mutex and then does another WaitForSingleObject on the same mutex, he is going to be re-inserted into the queue and it is quite unlikely that he will be inserted at the front of the queue if there are already other threads waiting. This makes sense, as allowing him to skip to the front of the queue could lead to other threads starving. So the scheduler will probably just suspend him and wake the the thread that is at the front of the queue instead.
I guess this is because of the multi processor.
When the first thread (running on the first processor) release the mutex, the second thread (on the second processor) got it, then when the first thread try to get the mutex, it can not. When the mutex is finally released by the second thread, it is taken by the third thread (on the first processor).

Advantages of using condition variables over mutex

I was wondering what is the performance benefit of using condition variables over mutex locks in pthreads.
What I found is : "Without condition variables, the programmer would need to have threads continually polling (possibly in a critical section), to check if the condition is met. This can be very resource consuming since the thread would be continuously busy in this activity. A condition variable is a way to achieve the same goal without polling." (https://computing.llnl.gov/tutorials/pthreads)
But it also seems that mutex calls are blocking (unlike spin-locks). Hence if a thread (T1) fails to get a lock because some other thread (T2) has the lock, T1 is put to sleep by the OS, and is woken up only when T2 releases the lock and the OS gives T1 the lock. The thread T1 does not really poll to get the lock. From this description, it seems that there is no performance benefit of using condition variables. In either case, there is no polling involved. The OS anyway provides the benefit that the condition-variable paradigm can provide.
Can you please explain what actually happens.
A condition variable allows a thread to be signaled when something of interest to that thread occurs.
By itself, a mutex doesn't do this.
If you just need mutual exclusion, then condition variables don't do anything for you. However, if you need to know when something happens, then condition variables can help.
For example, if you have a queue of items to work on, you'll have a mutex to ensure the queue's internals are consistent when accessed by the various producer and consumer threads. However, when the queue is empty, how will a consumer thread know when something is in there for it to work on? Without something like a condition variable it would need to poll the queue, taking and releasing the mutex on each poll (otherwise a producer thread could never put something on the queue).
Using a condition variable lets the consumer find that when the queue is empty it can just wait on the condition variable indicating that the queue has had something put into it. No polling - that thread does nothing until a producer puts something in the queue, then signals the condition that the queue has a new item.
You're looking for too much overlap in two separate but related things: a mutex and a condition variable.
A common implementation approach for a mutex is to use a flag and a queue. The flag indicates whether the mutex is held by anyone (a single-count semaphore would work too), and the queue tracks which threads are in line waiting to acquire the mutex exclusively.
A condition variable is then implemented as another queue bolted onto that mutex. Threads that got in line to wait to acquire the mutex can—usually once they have acquired it—volunteer to get out of the front of the line and get into the condition queue instead. At this point, you have two separate sets of waiters:
Those waiting to acquire the mutex exclusively
Those waiting for the condition variable to be signaled
When a thread holding the mutex exclusively signals the condition variable, for which we'll assume for now that it's a singular signal (unleashing no more than one waiting thread) and not a broadcast (unleashing all the waiting threads), the first thread in the condition variable queue gets shunted back over into the front (usually) of the mutex queue. Once the thread currently holding the mutex—usually the thread that signaled the condition variable—relinquishes the mutex, the next thread in the mutex queue can acquire it. That next thread in line will have been the one that was at the head of the condition variable queue.
There are many complicated details that come into play, but this sketch should give you a feel for the structures and operations in play.
If you are looking for performance, then start reading about "non blocking / non locking" thread synchronization algorithms. They are based upon atomic operations, which gcc is kind enough to provide. Lookup gcc atomic operations. Our tests showed we could increment a global value with multiple threads using atomic operation magnitudes faster than locking with a mutex. Here is some sample code that shows how to add items to and from a linked list from multiple threads at the same time without locking.
For sleeping and waking threads, signals are much faster than conditions. You use pthread_kill to send the signal, and sigwait to sleep the thread. We tested this too with the same kind of performance benefits. Here is some example code.

Resources