Condition variable usage pattern in C/C++ and other languages - multithreading

If you look at documentation describing the usage of condition variables (cv), you'll see that e.g. in PThreads and C++ you don't need to hold the mutex of a cv to call notify on this cv. Whereas e.g. in Java and Python, you must lock the mutex to do the same thing.
Is there some deep reason why things are implemented this way (I'm about the latter case), given that an implementation of a language like Java eventually uses some native threading tools?

The Java notify and notifyAll basic synchronization tools both require you to synchronize on the object before calling them. This is for a simple safety point since it also requires you to synchronize on them before waiting.
For example if you have two threads. One thread reads data from a buffer and one thread writes data into the buffer.
The reading data thread needs to wait until the writing data thread has finished writing a block of data into the buffer and then it can read the block.
If wait(), notify(), and notifyAll() methods can be called without synchronization then you can get a race condition where:
The reading thread calls wait() and the thread is added to waiting queue.
At the same time, the writing thread calls notify() to signal it has added data.
The reading thread misses the change and waits forever since the notify() was processed before the wait() was.
By forcing the wait and notify to happen within a synchronized block this race condition is removed.

Related

How can I block a single thread for 3 different events (semaphore, pthread condition, and blocking socket recv)?

I have a multi-threaded system in which a main thread has to wait in blocking state for one of the following 4 events to happen:
inter-process semaphore (sem_wait())
pthread condition (pthread_cond_wait())
recv() from socket
timeout expiring
Ideally I'd like a mechanism to unblock the main thread when any of the above occurs, something like a ppoll() with suitable timeout parameter. Non-blocking and polling is out of the picture due to the impact on the CPU usage, while having separate threads blocking on different events is not ideal due to the increased latency (one thread unblocking from one of the events should eventually wake up the main one).
The code will be almost exclusively compiled under Linux with gcc toolchain, if that helps, but some portability would be good, if at all possible.
Thanks in advance for any suggestion
The mechanisms for waiting on multiple types of objects on Unix-like systems are not that great. In general, the idea is to, wherever possible, use file descriptors for IPC rather than multiple different IPC mechanisms.
From your comment, it sounds like you can edit or change the condition variable, but not the code that signals the semaphore. So what I'd recommend is something like the following.
Change the condition variable to either a pipe (for more portability) or an eventfd(2) object (Linux-specific). The notifying thread writes to the pipe whenever it wants to signal the main thread. This will allow you to select(2) or poll(2) or whatever in the main thread on both that pipe and the socket.
Because you're stuck with the semaphore, I think the best option would be to create another thread, whose sole purpose is to wait for the semaphore using sem_wait(), and then write to another pipe or eventfd(2) object when it is notified by whatever process is doing sem_post(). In the main thread, just add this other file descriptor to your select(2) set.
So you'll have three descriptors: one for the socket, one taking the place of the condition variable, and one which is written to when the semaphore is incremented. You can then wait on all three using your favorite I/O multiplexing method, and include directly whatever timeout you'd like.

Win32 Uderstanding semaphore

I'm new to Multithread in Win32. And I have an assignment with Semaphore. But I cannot understand this.
Assume that we have 20 tasks (each task is the same with other tasks). We use semaphore then there's 2 circumstances:
First, there should be have 20 childthreads in order that each thread will handle 1 task.
Or:
Second, there would be have n childthreads. When a thread finishs a task, it will handle another task?
The second problem I counter that I cannot find any samples for Semaphore in Win32(API) but Consonle that I found in MSDN.
Can you help me with the "20 task" and tell me the instruction of writing a Semaphore in WinAPI application (Where should I place CreateSemaphore() function ...)?
Your suggestion will be appreciated.
You can start a thread for every task, which is a common approach, or you can use a "threadpool" where threads are reused. This is up to you. In both scenarios, you may or may not use a semaphore, the difference is only how you start the multiple threads.
Now, concerning your question where to place the CreateSemaphore() function, you should call that before starting any further threads. The reason is that these threads need to access the semaphore, but they can't do that if it doesn't exist yet. You could of course pass it to the other threads, but that again would give you the problem how to pass it safely without any race conditions, which is something that semaphores and other synchronization primitives are there to avoid. In other words, you would only complicate things by creating a chicken-and-egg problem.
Note that if this doesn't help you any further, you should perhaps provide more info. What are the goals? What have you done yourself so far? Any related questions here that you read but that didn't fully present answers to your problem?
Well, if you are contrained to using semaphores only, you could use two semaphores to create an unbounded producer-consumer queue class that you could use to implement a thread pool.
You need a 'SimpleQueue' class for task objects. I assume you either have one already, can easily build one or whatever.
In the ctor of your 'ProducerConsumerQueue' class, (or in main(), or in some factory function that returns a *ProducerConsumerQueue struct, whatever your language has), create a SimpleClass and two semaphores. A 'QueueCount' semaphore, initialized with a count of 0, and a 'QueueAccess' semaphore, initialized with a count of 1.
Add 'push(*task)' and ' *task pop()' methods/memberFunctions/methods to the ProducerConsumerQueue:
In 'push', first call 'WaitForSingleObject()' API on QueueAccess, then push the *task onto the SimpleQueue, then ReleaseSemaphore() API on QueueAccess. This pushes the *task in a thread-safe manner. Then ReleaseSemaphore() on QueueCount - this will signal any waiting threads.
In pop(), first call 'WaitForSingleObject()' API on QueueCount - this ensures that any calling consumer thread has to wait until there is a *task in the queue. Then call 'WaitForSingleObject()' API on QueueAccess, then pop task from the SimpleQueue, then ReleaseSemaphore() API on QueueAccess and return the task - this this thread-safely dequeues the *task.
Once you have created your ProducerConsumerQueue, create some threads to run the tasks. In CreateThread(), pass the same *ProducerConsumerQueue as the 'auxiliary' *void parameter.
In the thread function, cast the *void back to *ProducerConsumerQueue and then just loop around for ever, calling pop() and then running the returned task.
OK, your pool of threads is now ready to do stuff. If you want to run 20 tasks, create them in a loop and push them onto the ProducerConsumerQueue. The threads will then run them all.
You can create as many threads as you want to in the pool, (within reason). As many threads as cores is reasonable for tasks that are CPU-intensive. If the tasks make blocking calls, you may want to create many more threads for quickest overall throughput.
A useful enhancement is to check for 'null' in the thread function loop after each task is received and, if it is null, clean up an exit the thread, so terminating it. This allows the threads to be easily terminated by queueing up nulls, making it easier to shutdown your thread pool, (should you need to), and also to control the number of threads in the pool at runtime.

Accessing shared data from a signal handler

I want to know if it is a good idea to access shared data from a signal handler. I mean consider the scenario of multi process system and multithreaded system with a single process. In multi process system, lets say I have the processes handle a particular signal and update certain shared variable or memory by the processes. Can I do that from the signal handler itself.
However, in the case of threads using pthreads, I don't think it is doable. http://maxim.int.ru/bookshelf/PthreadsProgram/htm/r_40.html. As given in this article, they have mentioned that it is not asynchronous signal safe and have suggested to use sigwait for that. I am not why it is not asynchronous signal safe. I mean lets say, I handle a signal by a thread and is in the signal handler routing. I acquire a lock on the shared memory to update it. In the mean time another signal of the same type arrives and another thread responsible for handling it executes the signal handler again. Here the signal handler is same for the process but it is called multiple time. The second time around, it cannot see the lock and updates/overrides the data. Is this the issue with multithreaded signal handlers using shared data.
I am a bit confused, in multi process systems, I have a copy of the signal handler for each process. But in multithreaded system, there is a single copy of the signal handler used by the multiple threads isn't it. So when multiple signals of the same type arrives and we have two threads that are responsible for handling it try to handle it, then both of them will try to execute the same piece of handler code? How does it all fit in?
I read through the article that you reference and found some interesting information in the "Threads in Signal Handlers" section. In that section, you'll see that they have a list of Posix function calls that can be made from within signal handlers. Then soon after that list, they mention the following:
But where are the Pthreads calls? They're not in either of these
lists! In fact, the Pthreads standard specifies that the behavior of
all Pthreads functions is undefined when the function is called from a
signal handler. If your handler needs to manipulate data that is
shared with other threads≈buffers, flags, or state variables≈it's out
of luck. The Pthreads mutex and condition variable synchronization
calls are off limits.
Notice the last sentence: "Pthreads mutex and condition variable synchronization calls are off limits"
The aforementioned functions that can be called from a signal handler are described as follows:
These functions have a special property known as reentrancy that
allows a process to have multiple calls to these functions in progress
at the same time.
The pthread synchronization functions dont have the special property known as reentrancy, so I imagine that if these functions (pthread_mutex_lock() for instance) are interrupted by an arriving signal, then the behavior is not "safe".
Imagine that your application calls pthread_mutex_lock(&theMutex) and at exactly that moment (that is, while in the pthread_mutex_lock() function) a signal arrives. If the signal handler also calls pthread_mutex_lock(&theMutex), the previous pthread call may not have terminated, so it cant be guaranteed which call to pthread_mutex_lock() will get the lock. So the resulting behavior will be undefined/undeterministic.
I would imagine that the call to sigwait() from a particular thread would guarantee that no important, non-reentrancy function calls may get interrupted, thus allowing calls to the pthread synchronization functions to be "safe".

Advantages of using condition variables over mutex

I was wondering what is the performance benefit of using condition variables over mutex locks in pthreads.
What I found is : "Without condition variables, the programmer would need to have threads continually polling (possibly in a critical section), to check if the condition is met. This can be very resource consuming since the thread would be continuously busy in this activity. A condition variable is a way to achieve the same goal without polling." (https://computing.llnl.gov/tutorials/pthreads)
But it also seems that mutex calls are blocking (unlike spin-locks). Hence if a thread (T1) fails to get a lock because some other thread (T2) has the lock, T1 is put to sleep by the OS, and is woken up only when T2 releases the lock and the OS gives T1 the lock. The thread T1 does not really poll to get the lock. From this description, it seems that there is no performance benefit of using condition variables. In either case, there is no polling involved. The OS anyway provides the benefit that the condition-variable paradigm can provide.
Can you please explain what actually happens.
A condition variable allows a thread to be signaled when something of interest to that thread occurs.
By itself, a mutex doesn't do this.
If you just need mutual exclusion, then condition variables don't do anything for you. However, if you need to know when something happens, then condition variables can help.
For example, if you have a queue of items to work on, you'll have a mutex to ensure the queue's internals are consistent when accessed by the various producer and consumer threads. However, when the queue is empty, how will a consumer thread know when something is in there for it to work on? Without something like a condition variable it would need to poll the queue, taking and releasing the mutex on each poll (otherwise a producer thread could never put something on the queue).
Using a condition variable lets the consumer find that when the queue is empty it can just wait on the condition variable indicating that the queue has had something put into it. No polling - that thread does nothing until a producer puts something in the queue, then signals the condition that the queue has a new item.
You're looking for too much overlap in two separate but related things: a mutex and a condition variable.
A common implementation approach for a mutex is to use a flag and a queue. The flag indicates whether the mutex is held by anyone (a single-count semaphore would work too), and the queue tracks which threads are in line waiting to acquire the mutex exclusively.
A condition variable is then implemented as another queue bolted onto that mutex. Threads that got in line to wait to acquire the mutex can—usually once they have acquired it—volunteer to get out of the front of the line and get into the condition queue instead. At this point, you have two separate sets of waiters:
Those waiting to acquire the mutex exclusively
Those waiting for the condition variable to be signaled
When a thread holding the mutex exclusively signals the condition variable, for which we'll assume for now that it's a singular signal (unleashing no more than one waiting thread) and not a broadcast (unleashing all the waiting threads), the first thread in the condition variable queue gets shunted back over into the front (usually) of the mutex queue. Once the thread currently holding the mutex—usually the thread that signaled the condition variable—relinquishes the mutex, the next thread in the mutex queue can acquire it. That next thread in line will have been the one that was at the head of the condition variable queue.
There are many complicated details that come into play, but this sketch should give you a feel for the structures and operations in play.
If you are looking for performance, then start reading about "non blocking / non locking" thread synchronization algorithms. They are based upon atomic operations, which gcc is kind enough to provide. Lookup gcc atomic operations. Our tests showed we could increment a global value with multiple threads using atomic operation magnitudes faster than locking with a mutex. Here is some sample code that shows how to add items to and from a linked list from multiple threads at the same time without locking.
For sleeping and waking threads, signals are much faster than conditions. You use pthread_kill to send the signal, and sigwait to sleep the thread. We tested this too with the same kind of performance benefits. Here is some example code.

Condition variables: used only to simulate monitors?

I'm reading "Multithreaded, Parallel, and Distributed Programming" by Gregory Andrews, and in this book the author mentions that he'll show how to use locks "in combination with condition variables to simulate monitors".
I have also heard several times that "mutex lock + condition variable" is a common pattern in programs using Posix threads.
So my question is: are there other common uses of condition variables besides this (using them in combination with locks to simulate monitors"? If so, what would be a simple example of usage?
A monitor enables two different things:
mutal exclusion - at most one thread may own the monitor at any given time
cooperation - the thread owning the monitor can opt to wait until it is awakened by a cooperating thread via a notification sent through the monitor
The Posix threading library separates these two concerns into two different objects:
mutual exclusion is accomplished using a mutex
cooperation is accomplished using a condition variable
It is assumed that the cooperation is with regard to some state shared between threads. This state is expected to be protected by a mutex. Thus, the basic wait operation takes two arguments:
a condition variable to wait for notification (signaling) on
the mutex protecting the shared state
When a thread waits on a condition variable using a mutex, the mutex is released and the thread is put to sleep. When the thread awakens, it will reacquire the mutex before continuing.
Signaling (notify one thread) or broadcasting (notify all threads) a condition variable does not require a mutex.
Condition variables are intended solely for this use. It is possible to use them as a "sleep for a while and release this mutex while you sleep" command by using a private condition variable that is never signaled and a timed wait (pthread_cond_timedwait()).
A condition variable is always used in conjunction with a mutex. For example, when you call pthread_cond_wait, you must specify not only the condition variable itself, but also the mutex you're using with it.

Resources