Can it be assumed that `pthread_cond_signal` will wake the signaled thread atomically with regards to the mutex bond to the condition variable? - multithreading

Quoting POSIX:
The pthread_cond_broadcast() or pthread_cond_signal() functions may be called by a thread whether or not it currently owns the mutex that threads calling pthread_cond_wait() or pthread_cond_timedwait() have associated with the condition variable during their waits; however, if predictable scheduling behavior is required, then that mutex shall be locked by the thread calling pthread_cond_broadcast() or pthread_cond_signal().
"If predictable scheduling behavior is required". This could/would hint that locking the mutex bound to the condition variable right before calling pthread_cond_signal() should guarantee that the signaled thread will be woken up before any other thread manages to lock this mutex. Is this correct?

We will se if any PThreads guru has a more comprehensive answer, but as far as I can see, at least in the Linux manpage, you do not get fully predictable behavior. What you do get is a guarantee that if two threads wait on the same condition variable, the higher-prio thread gets to go first (at least, that should be true on Linux if one thread is SCHED_OTHER and the other is real-time SCHED_FIFO). That holds if you lock mutex before signalling (with reservation for errors after a quick read of the manpage).
See
https://linux.die.net/man/3/pthread_cond_signal

No, there is no guarantee the signalled thread will be waken up. Worse, if in the signalling thread you have sequence:
while(run_again) {
pthread_mutex_lock(&mutex);
/* prepare data */
pthread_mutex_unlock(&mutex);
pthread_cond_broadcast(&cond);
}
there is reasonable chance control would never be passed to other threads waiting on mutex because of logic in the scheduler. Some examples to play with you can find in this answer.

No.
The best reference I have found regarding the predictability is this one:
https://austin-group-l.opengroup.narkive.com/lKcmfoRI/predictable-scheduling-behavior-in-pthread-cond-broadcast
Basically, people want to guard against the possibility that threads do not get a fair chance to run. Apparently, it is not a problem for most producer-consumer scenarios, and it does not apply to pthread_cond_broadcast as well. I would say, it is useful only in limited cases.
Cppreference.com actually considers unlocking after notifying may be a pessimization:
https://en.cppreference.com/w/cpp/thread/condition_variable/notify_all

Related

linux unnamed posix semaphore sem_destroy, sem_wait method question

when there are waiting semaphores of sem_wait method, I call the sem_destroy method on other thread. But waiting semaphore was not wake up.
In case of mutex, pthread_mutex_destroy was return the value EBUSY when there are some waiting threads.
however sem_destroy return 0 and errno was also set 0.
I want to destroy semaphore after calling sem_destroy to block access as destroyed semaphore and to wake up the waiting thread.
Semaphore handle of Window OS is possible.
please advise me. thank you.
POSIX says this about sem_destroy:
The effect of destroying a semaphore upon which other threads are currently blocked is undefined.
It specifically doesn't say that other threads are woken up. In fact, if sem_t contains a pointer to memory, what it almost certainly does do is free the memory, meaning you then have a use-after-free security problem. (Whether that is the case depends on your libc.)
The general approach of allocation for mutexes and semaphores is that they should be either allocated and freed with their relevant data structure, or they should be allocated before the relevant code needs them and then freed after the entire code is done with them. In C, you cannot safely deallocate data structures (e.g., with sem_destroy) that are in use.
If you want to wake up all users of the semaphore, you must increment it until all users have awoken. You can call sem_getvalue to determine if anyone is waiting on the semaphore and then call sem_post to increment it. Only then can you safely destroy it. Note that this can have a race condition, depending on your code.
However, note that you must be careful that the other code does not continue to use the semaphore after it's destroyed, such as by trying to re-acquire it in a loop. If you are careful to structure your code properly, then you can have confidence that this won't happen.

Know how many are waiting on a pthread mutex lock

I would like to know how many threads are waiting on a lock so I would be able to destroy it safely.
The problem is that I can't destroy the lock when someone holds it or someone is waiting on it.
My program can make sure that no new requests are made to acquire the lock, but how can I know when all the threads that waited on it are done with it?
I thought about a conditional variable but I suspect it will create problems..
dlv, could you add some code snippet to your description.
I hope you should be using condition variables,
Each thread will block in pthread_cond_wait() until the other thread signals it to wake up. This will not cause a deadlock. It can easily be extended to many threads, by allocating one int, pthread_cond_t and pthread_mutex_t per thread.
pthread_cond_wait() blocks the calling thread until the specified condition is signalled. This routine should be called while mutex is locked, and it will automatically release the mutex while it waits. After signal is received and thread is awakened, mutex will be automatically locked for use by the thread. The programmer is then responsible for unlocking mutex when the thread is finished with it.
The pthread_cond_signal() routine is used to signal (or wake up) another thread which is waiting on the condition variable. It should be called after mutex is locked, and must unlock mutex in order for pthread_cond_wait() routine to complete.
The pthread_cond_broadcast() routine should be used instead of pthread_cond_signal() if more than one thread is in a blocking wait state.
It is a logical error to call pthread_cond_signal() before calling pthread_cond_wait().
Proper locking and unlocking of the associated mutex variable is essential when using these routines. For example:
Failing to lock the mutex before calling pthread_cond_wait() may cause it NOT to block.
Failing to unlock the mutex after calling pthread_cond_signal() may not allow a matching pthread_cond_wait() routine to complete (it will remain blocked).
If threads that can use the mutex still exist or might be created in the future then don't delete it.
You do know and are tracking what threads are created, right?
If, for some reason, you cannot keep track of the threads using a resource, your only way out is to leak the resource. It can never be safely deleted because you never know when you are done using it.
Say you had a counter that counted the threads using a mutex. That counter would need its own mutex. Then how do you decide when to delete that one?
That way of thinking is the road that leads to hell. You could do what you want with condition variables, but the result would be an extremely weak design.
Assuming you managed to create such a monster, it would basically allow you to kill "safely" any other thread regardless of its internal state. Except for a quick and dirty panic exit (in case of some internal software error), this is the worst possible way of solving synchronization issues.
A design relying on such tricks would have to create implicit synchronizations between tasks to make sure the terminations occur in the proper order. A lot of software are designed that way, and most of them allow mediocre programmers to make a living by maintaining the pile of crap they created in the first place.
Task termination should be an issue solved at global design level, not by a toolbox of wonky objects that allow you to twist synchronization any odd way.

Difference between a semaphore and a conditional variable

I am implementing conditional wait, and both semaphore or conditional varible can be used to implement it. Is there any difference between the two? More specifically from the performance point of view?
I have heard that when a thread waits on a conditional variable it is not scheduled until it is signaled. This ensures that it does not consume CPU cycle. But this is not true for a semaphore and a semaphore will consume CPU cycle even if it is waiting?
If all of your threads are waiting for some event, e.g., submission of a task, then you can wake them all up by using a condition variable upon an event.
If you have a limited resource, say 10 pages of memory reserved for your threads, then you will need them to wait until a page is available. When this happens, you will need to let just one thread start execution. In this case you can use a semaphore unlock up as many threads as available pages.
A semaphore has extra state - a count of units held - as well as a queue for threads waiting on it, so allowing a sema to, say, record how many times it has been signaled even if there is no thread currently waiting on it. If a thread loops around a semaphore wait() and the semaphore is signaled N times, the thread will eventually loop N times, even if the thread is sometimes busy when the sema is signaled - very useful for producer-consumer queues.
A condvar does not have this extra count state, but it can release a lock that it is bound to until a thread signals it - very useful for producer-consumer queues.
Sometimes, I wish for a combination of the two - a condvar with a count, but this does not seem to to be forthcoming from OS developers :(
A semaphore and condvar are the same in that they are both synchro primitives. Apart from that..
conditional variable and binary semaphore both block thread until specified signaled condition true , and both are same you can use any one but conditional varibale always use with mutex . in both cases you scheduled only by signal without it you can not scheduled . But in case to maintain number of resources in this case you use counting semaphore .
Both are not consume cpu when you use mutex.

Advantages of using condition variables over mutex

I was wondering what is the performance benefit of using condition variables over mutex locks in pthreads.
What I found is : "Without condition variables, the programmer would need to have threads continually polling (possibly in a critical section), to check if the condition is met. This can be very resource consuming since the thread would be continuously busy in this activity. A condition variable is a way to achieve the same goal without polling." (https://computing.llnl.gov/tutorials/pthreads)
But it also seems that mutex calls are blocking (unlike spin-locks). Hence if a thread (T1) fails to get a lock because some other thread (T2) has the lock, T1 is put to sleep by the OS, and is woken up only when T2 releases the lock and the OS gives T1 the lock. The thread T1 does not really poll to get the lock. From this description, it seems that there is no performance benefit of using condition variables. In either case, there is no polling involved. The OS anyway provides the benefit that the condition-variable paradigm can provide.
Can you please explain what actually happens.
A condition variable allows a thread to be signaled when something of interest to that thread occurs.
By itself, a mutex doesn't do this.
If you just need mutual exclusion, then condition variables don't do anything for you. However, if you need to know when something happens, then condition variables can help.
For example, if you have a queue of items to work on, you'll have a mutex to ensure the queue's internals are consistent when accessed by the various producer and consumer threads. However, when the queue is empty, how will a consumer thread know when something is in there for it to work on? Without something like a condition variable it would need to poll the queue, taking and releasing the mutex on each poll (otherwise a producer thread could never put something on the queue).
Using a condition variable lets the consumer find that when the queue is empty it can just wait on the condition variable indicating that the queue has had something put into it. No polling - that thread does nothing until a producer puts something in the queue, then signals the condition that the queue has a new item.
You're looking for too much overlap in two separate but related things: a mutex and a condition variable.
A common implementation approach for a mutex is to use a flag and a queue. The flag indicates whether the mutex is held by anyone (a single-count semaphore would work too), and the queue tracks which threads are in line waiting to acquire the mutex exclusively.
A condition variable is then implemented as another queue bolted onto that mutex. Threads that got in line to wait to acquire the mutex can—usually once they have acquired it—volunteer to get out of the front of the line and get into the condition queue instead. At this point, you have two separate sets of waiters:
Those waiting to acquire the mutex exclusively
Those waiting for the condition variable to be signaled
When a thread holding the mutex exclusively signals the condition variable, for which we'll assume for now that it's a singular signal (unleashing no more than one waiting thread) and not a broadcast (unleashing all the waiting threads), the first thread in the condition variable queue gets shunted back over into the front (usually) of the mutex queue. Once the thread currently holding the mutex—usually the thread that signaled the condition variable—relinquishes the mutex, the next thread in the mutex queue can acquire it. That next thread in line will have been the one that was at the head of the condition variable queue.
There are many complicated details that come into play, but this sketch should give you a feel for the structures and operations in play.
If you are looking for performance, then start reading about "non blocking / non locking" thread synchronization algorithms. They are based upon atomic operations, which gcc is kind enough to provide. Lookup gcc atomic operations. Our tests showed we could increment a global value with multiple threads using atomic operation magnitudes faster than locking with a mutex. Here is some sample code that shows how to add items to and from a linked list from multiple threads at the same time without locking.
For sleeping and waking threads, signals are much faster than conditions. You use pthread_kill to send the signal, and sigwait to sleep the thread. We tested this too with the same kind of performance benefits. Here is some example code.

Mutex lock: what does "blocking" mean?

I've been reading up on multithreading and shared resources access and one of the many (for me) new concepts is the mutex lock. What I can't seem to find out is what is actually happening to the thread that finds a "critical section" is locked. It says in many places that the thread gets "blocked", but what does that mean? Is it suspended, and will it resume when the lock is lifted? Or will it try again in the next iteration of the "run loop"?
The reason I ask, is because I want to have system supplied events (mouse, keyboard, etc.), which (apparantly) are delivered on the main thread, to be handled in a very specific part in the run loop of my secondary thread. So whatever event is delivered, I queue in my own datastructure. Obviously, the datastructure needs a mutex lock because it's being modified by both threads. The missing puzzle-piece is: what happens when an event gets delivered in a function on the main thread, I want to queue it, but the queue is locked? Will the main thread be suspended, or will it just jump over the locked section and go out of scope (losing the event)?
Blocked means execution gets stuck there; generally, the thread is put to sleep by the system and yields the processor to another thread. When a thread is blocked trying to acquire a mutex, execution resumes when the mutex is released, though the thread might block again if another thread grabs the mutex before it can.
There is generally a try-lock operation that grab the mutex if possible, and if not, will return an error. But you are eventually going to have to move the current event into that queue. Also, if you delay moving the events to the thread where they are handled, the application will become unresponsive regardless.
A queue is actually one case where you can get away with not using a mutex. For example, Mac OS X (and possibly also iOS) provides the OSAtomicEnqueue() and OSAtomicDequeue() functions (see man atomic or <libkern/OSAtomic.h>) that exploit processor-specific atomic operations to avoid using a lock.
But, why not just process the events on the main thread as part of the main run loop?
The simplest way to think of it is that the blocked thread is put in a wait ("sleeping") state until the mutex is released by the thread holding it. At that point the operating system will "wake up" one of the threads waiting on the mutex and let it acquire it and continue. It's as if the OS simply puts the blocked thread on a shelf until it has the thing it needs to continue. Until the OS takes the thread off the shelf, it's not doing anything. The exact implementation -- which thread gets to go next, whether they all get woken up or they're queued -- will depend on your OS and what language/framework you are using.
Too late to answer but I may facilitate the understanding. I am talking more from implementation perspective rather than theoretical texts.
The word "blocking" is kind of technical homonym. People may use it for sleeping or mere waiting. The term has to be understood in context of usage.
Blocking means Waiting - Assume on an SMP system a thread B wants to acquire a spinlock held by some other thread A. One of the mechanisms is to disable preemption and keep spinning on the processor unless B gets it. Another mechanism probably, an efficient one, is to allow other threads to use processor, in case B does not gets it in easy attempts. Therefore we schedule out thread B (as preemption is enabled) and give processor to some other thread C. In this case thread B just waits in the scheduler's queue and comes back with its turn. Understand that B is not sleeping just waiting rather passively instead of busy-wait and burning processor cycles. On BSD and Solaris systems there are data-structures like turnstiles to implement this situation.
Blocking means Sleeping - If the thread B had instead made system call like read() waiting data from network socket, it cannot proceed until it gets it. Therefore, some texts casually use term blocking as "... blocked for I/O" or "... in blocking system call". Actually, thread B is rather sleeping. There are specific data-structures known as sleep queues - much like luxury waiting rooms on air-ports :-). The thread will be woken up when OS detects availability of data, much like an attendant of the waiting room.
Blocking means just that. It is blocked. It will not proceed until able. You don't say which language you're using, but most languages/libraries have lock objects where you can "attempt" to take the lock and then carry on and do something different depending on whether you succeeded or not.
But in, for example, Java synchronized blocks, your thread will stall until it is able to acquire the monitor (mutex, lock). The java.util.concurrent.locks.Lock interface describes lock objects which have more flexibility in terms of lock acquisition.

Resources