How is the atomic unlock-and-block achieved in the implementation of synchronization primitives like mutexes and condition variables?

How is the atomic unlock-and-block achieved in the implementation of synchronization primitives like mutexes and condition variables? - multithreading

For example, suppose you are using atomic spinlock on an integer flag to ensure only one thread modifies the wait-queue that the mutex maintains at any given time. When a thread tries to lock the mutex, we want it to enqueue itself and set the flag to zero before it blocks itself and the unlocker to dequeue a thread from the queue and set it to runnable.
Consider only two threads to be present, one locking and the other releasing the mutex at the same time. if the locker was preempted after it added himself to the queue and set the flag to zero (but not blocked itself yet) and the unlocker then tried to dequeue and make the thread runnable, it wouldn't be useful since the thread hasn't blocked itself yet. So the make-runnable call would be waste but more importantly, the locker thread would then block itself after that and would remain blocked forever.
How is this atomicity achieved to ensure correctness? A similar scenario can be imagined in condition variables with the release of mutex and blocking itself.

Related

How can any thread signal for release of a binary semaphore

I am new to multithreading paradigm. While learning concurrency, every source I found says:
"The difference between mutex and binary semaphore is the ownership
i.e. a mutex can be signaled for release by only the thread who
created it while a semaphore can be signaled any thread"
Considering a scenario where thread A has acquired a binary semaphore lock on resource x and processing it. If any thread can call a release signal for lock on x, doesn't this open a possibility of any thread calling a release on the lock amidst the time when thread A was using x.
Isn't there a scope of inconsistency in this or am I missing something?

Of course, if threads are arbitrarily acquiring or releasing a semaphore, the result would be disastrous and the fact, that implementations do not prevent this, does not imply that this is a useful scenario.
However, there might be real use cases if the involved threads use another mechanism to coordinate themselves while using the semaphore to hold these threads off which do not participate in these coordination.
Imagine you expand a use case where one thread acquires the semaphore for performing a task to parallel execution of said task. After acquiring the semaphore, several worker threads are spawned, each of them working on a different part of the data, thus naturally working non-interfering. Then the last worker thread releases the semaphore which elides the need for another communication between the initiating thread and the worker threads. Of course, this requires the worker thread to detect whether it is the last one, but a simple atomic integer holding the number of active workers would be sufficient.

Java Thread Live Lock

I have an interesting problem related to Java thread live lock. Here it goes.
There are four global locks - L1,L2,L3,L4
There are four threads - T1, T2, T3, T4
T1 requires locks L1,L2,L3
T2 requires locks L2
T3 required locks L3,L4
T4 requires locks L1,L2
So, the pattern of the problem is - Any of the threads can run and acquire the locks in any order. If any of the thread detects that a lock which it needs is not available, it release all other locks it had previously acquired waits for a fixed time before retrying again. The cycle repeats giving rise to a live lock condition.
So, to solve this problem, I have two solutions in mind
1) Let each thread wait for a random period of time before retrying.
OR,
2) Let each thread acquire all the locks in a particular order ( even if a thread does not require all the
locks)
I am not convinced that these are the only two options available to me. Please advise.

Have all the threads enter a single mutex-protected state-machine whenever they require and release their set of locks. The threads should expose methods that return the set of locks they require to continue and also to signal/wait for a private semaphore signal. The SM should contain a bool for each lock and a 'Waiting' queue/array/vector/list/whatever container to store waiting threads.
If a thread enters the SM mutex to get locks and can immediately get its lock set, it can reset its bool set, exit the mutex and continue on.
If a thread enters the SM mutex and cannot immediately get its lock set, it should add itself to 'Waiting', exit the mutex and wait on its private semaphore.
If a thread enters the SM mutex to release its locks, it sets the lock bools to 'return' its locks and iterates 'Waiting' in an attempt to find a thread that can now run with the set of locks available. If it finds one, it resets the bools appropriately, removes the thread it found from 'Waiting' and signals the 'found' thread semaphore. It then exits the mutex.
You can twiddle with the algorithm that you use to match up the available set lock bools with waiting threads as you wish. Maybe you should release the thread that requires the largest set of matches, or perhaps you would like to 'rotate' the 'Waiting' container elements to reduce starvation. Up to you.
A solution like this requires no polling, (with its performance-sapping CPU use and latency), and no continual aquire/release of multiple locks.
It's much easier to develop such a scheme with an OO design. The methods/member functions to signal/wait the semaphore and return the set of locks needed can usually be stuffed somewhere in the thread class inheritance chain.

Unless there is a good reason (performance wise) not to do so,
I would unify all locks to one lock object.
This is similar to solution 2 you suggested, only more simple in my opinion.
And by the way, not only is this solution more simple and less bug proned,
The performance might be better than solution 1 you suggested.

Personally, I have never heard of Option 1, but I am by no means an expert on multithreading. After thinking about it, it sounds like it will work fine.
However, the standard way to deal with threads and resource locking is somewhat related to Option 2. To prevent deadlocks, resources need to always be acquired in the same order. For example, if you always lock the resources in the same order, you won't have any issues.

Go with 2a) Let each thread acquire all of the locks that it needs (NOT all of the locks) in a particular order; if a thread encounters a lock that isn't available then it releases all of its locks
As long as threads acquire their locks in the same order you can't have deadlock; however, you can still have starvation (a thread might run into a situation where it keeps releasing all of its locks without making forward progress). To ensure that progress is made you can assign priorities to threads (0 = lowest priority, MAX_INT = highest priority) - increase a thread's priority when it has to release its locks, and reduce it to 0 when it acquires all of its locks. Put your waiting threads in a queue, and don't start a lower-priority thread if it needs the same resources as a higher-priority thread - this way you guarantee that the higher-priority threads will eventually acquire all of their locks. Don't implement this thread queue unless you're actually having problems with thread starvation, though, because it's probably less efficient than just letting all of your threads run at once.
You can also simplify things by implementing omer schleifer's condense-all-locks-to-one solution; however, unless threads other than the four you've mentioned are contending for these resources (in which case you'll still need to lock the resources from the external threads), you can more efficiently implement this by removing all locks and putting your threads in a circular queue (so your threads just keep running in the same order).

How can I grant ownership of a mutex to a specific thread?

Imagine I have a mutex locked. There is unlimited number of other threads waiting to lock the mutex. When I unlock the mutex, one of those threads will be chosen to enter the critical section. However I have no control over which one. What if I want specific thread to enter the critical section?
I am prety sure this cannot be done using POSIX mutex, however, can I emulate the behaviour using different synchronisation object(s)?

You can use a mutex, a condition variable and a thread id to achive that.
Before unlocking the mutex the thread sets the target thread id, broadcasts the condition variable and releases the mutex. The waiting threads wake up, lock the mutex and check whether the target thread id equals to this thread id. If not the thread goes back to wait.
An optimization to this method to avoid waking up all waiting threads just to check the target thread id and then go back to wait would be to use a separate condition variable for each waiting thread. This way the signaling thread would notify the condition variable of the particular target thread.
Another option is to use signals sent to a particular thread. Let's say we use SIGRTMINfor this purpose. First, all threads block this signal at the start, so that the signal becomes pending and doesn't get lost when the thread isn't waiting for it. When a thread wants to lock the mutex it first calls sigwait() which atomically unblocks SIGRTMIN and waits for it or delivers an already pending one. Once the thread received the signal it can proceed and lock the mutex. The signaling thread uses pthread_kill(target_thread_id, SIGRTMIN) to wake up a particular thread.

How can many threads wait on a condition variable if we place a mutex before it?

pthread_cond_broadcast is used to wake up several threads waiting on a condition variable. But, at the same time it is also said that we should place a mutex before the condition variable to avoid race conditions.
So, if a mutex is there, and one thread already holds it and thus waits on the variable, how can any other thread hold the same mutex (to enter to the waiting part)?

When a thread calls pthread_cond_wait() it needs to hold the associated mutex. The API will release the mutex while it blocks the thread. Once the API decides the thread needs to be released, it will acquire the mutex before returning from the API.
Holding the mutex is required because checking the condition then calling pthread_cond_wait() isn't an atomic operation. The mutex (and the proper use of the mutex) prevents the thread from missing the condition becoming true in the short period between checking it and calling the wait.
But the short answer to the specific question (how can another thread obtain the mutex) is that while the thread is blocked in pthread_cond_wait(), the mutex is not held.

Advantages of using condition variables over mutex

I was wondering what is the performance benefit of using condition variables over mutex locks in pthreads.
What I found is : "Without condition variables, the programmer would need to have threads continually polling (possibly in a critical section), to check if the condition is met. This can be very resource consuming since the thread would be continuously busy in this activity. A condition variable is a way to achieve the same goal without polling." (https://computing.llnl.gov/tutorials/pthreads)
But it also seems that mutex calls are blocking (unlike spin-locks). Hence if a thread (T1) fails to get a lock because some other thread (T2) has the lock, T1 is put to sleep by the OS, and is woken up only when T2 releases the lock and the OS gives T1 the lock. The thread T1 does not really poll to get the lock. From this description, it seems that there is no performance benefit of using condition variables. In either case, there is no polling involved. The OS anyway provides the benefit that the condition-variable paradigm can provide.
Can you please explain what actually happens.

A condition variable allows a thread to be signaled when something of interest to that thread occurs.
By itself, a mutex doesn't do this.
If you just need mutual exclusion, then condition variables don't do anything for you. However, if you need to know when something happens, then condition variables can help.
For example, if you have a queue of items to work on, you'll have a mutex to ensure the queue's internals are consistent when accessed by the various producer and consumer threads. However, when the queue is empty, how will a consumer thread know when something is in there for it to work on? Without something like a condition variable it would need to poll the queue, taking and releasing the mutex on each poll (otherwise a producer thread could never put something on the queue).
Using a condition variable lets the consumer find that when the queue is empty it can just wait on the condition variable indicating that the queue has had something put into it. No polling - that thread does nothing until a producer puts something in the queue, then signals the condition that the queue has a new item.

You're looking for too much overlap in two separate but related things: a mutex and a condition variable.
A common implementation approach for a mutex is to use a flag and a queue. The flag indicates whether the mutex is held by anyone (a single-count semaphore would work too), and the queue tracks which threads are in line waiting to acquire the mutex exclusively.
A condition variable is then implemented as another queue bolted onto that mutex. Threads that got in line to wait to acquire the mutex can—usually once they have acquired it—volunteer to get out of the front of the line and get into the condition queue instead. At this point, you have two separate sets of waiters:
Those waiting to acquire the mutex exclusively
Those waiting for the condition variable to be signaled
When a thread holding the mutex exclusively signals the condition variable, for which we'll assume for now that it's a singular signal (unleashing no more than one waiting thread) and not a broadcast (unleashing all the waiting threads), the first thread in the condition variable queue gets shunted back over into the front (usually) of the mutex queue. Once the thread currently holding the mutex—usually the thread that signaled the condition variable—relinquishes the mutex, the next thread in the mutex queue can acquire it. That next thread in line will have been the one that was at the head of the condition variable queue.
There are many complicated details that come into play, but this sketch should give you a feel for the structures and operations in play.

If you are looking for performance, then start reading about "non blocking / non locking" thread synchronization algorithms. They are based upon atomic operations, which gcc is kind enough to provide. Lookup gcc atomic operations. Our tests showed we could increment a global value with multiple threads using atomic operation magnitudes faster than locking with a mutex. Here is some sample code that shows how to add items to and from a linked list from multiple threads at the same time without locking.
For sleeping and waking threads, signals are much faster than conditions. You use pthread_kill to send the signal, and sigwait to sleep the thread. We tested this too with the same kind of performance benefits. Here is some example code.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string