What is the Mutex acquire and release order? - linux

I know the functionality of Mutex. But now I am confused about its timing. I specially mean in the Linux kernel code.
For example, we have 3 threads (let's say they are on the same processor and are all normal tasks with the same priorities). Thread 1 ,2 and 3 try to acquire the Mutex and only Thread 1 gets it. Thread 2 and 3 are blocked and go to sleep. Then Thread 1 has done his job and unlock the Mutex.
So here is my question: At this very moment, what will happen? Will Thread 1 continue to execute because its scheduled time slice is not used up? Or will Thread 2 acquire the lock immediately and start to execute because it is the second thread who wants to acquire the lock? Or will Thread 3 acquire the lock immediately and start to execute because it is assumed to run next from the task scheduler(Let's assume this)? What will happen?

Once Thread 1 releases the lock, what happens next is non-deterministic. Any of the scenarios you outlined above are possible.
If your application requires a very specific order among threads, then you might want to try having the threads communicate more explicitly among themselves. In C, you can do this with a pipe().
Generally though, the performance is best if you embrace the chaos and let the scheduler choose.

Once Thread 1 has done his job, he gives the MUTEX back to others, and goes to sleep.

Related

Understanding Condition Variables - Isn't there always spin-waiting "somewhere"?

I am going to explain my understanding of this OS construct and appreciate some polite correction.
I understand thread-safety clearly and simply.
If there is some setup where
X: some condition
Y: do something
and
if X
do Y
is atomic, meaning that if at the exact moment in time
doing Y
not X
there is some problem.
By my understanding, the lowest-level solution of this is to use shared objects (mutexes). As an example, in the solution to the "Too Much Milk" Problem
Thead A | Thread B
-------------------------------------
leave Note A | leave Note B
while Note B | if no Note A
do nothing | if no milk
if no milk | buy milk
buy milk | remove Note B
remove Note A |
Note A and Note B would be the shared objects, i.e. some piece of memory accessible by both threads A and B.
This is can be generalized (beyond milk) for 2-thread case like
Thead A | Thread B
-------------------------------------
leave Note A | leave Note B
while Note B | if no Note A
do nothing | if X
if X | do Y
do Y | remove Note B
remove Note A |
and there is some way to generalize it for the N-thread case (so I'll continue referring to the 2-thread case for simplicity).
Possibly incorrect assumption #1: This is the lowest-level solution known (possible?).
Now one of the defficiencies of this solution is the spinning or busy-wait
while Note B
do nothing
because if the do Y is an expensive task then the thread scheduler will keep switching to Thread A to perform this check, i.e. the thread is still "awake" and using processing power even when we "know" its processing is to perform a check that will fail for some time.
The question then becomes: Is there some way we could make Thread A "sleep", so that it isn't scheduled to run until Note B is gone, and then "wake up"???
The Condition Variable design pattern provides a solution and it built on top of mutexes.
Possibly incorrect assumption #2: Then, isn't there still some spinning under the hood? Is the average amount of spinning somehow reduced?
I could use a logical explanation like only S.O. can provide ;)
Isn't there still some spinning under the hood.
No. That's the whole point of condition variables: It's to avoid the need for spinning.
An operating system scheduler creates a private object to represent each thread, and it keeps these objects in containers which, for purpose of this discussion, we will call queues.
Simplistic explanation:
When a thread calls condition.await(), that invokes a system call. The scheduler handles it by removing the calling thread from whatever CPU it was running on, and by putting its proxy object into a queue. Specifically, it puts it into the queue of threads that are waiting to be notified about that particular condition.
There usually is a separate queue for every different thing that a thread could wait for. If you create a mutex, the OS creates a queue of threads that are waiting to acquire the mutex. If you create a condition variable, the OS creates a queue of threads that are waiting to be notified.
Once the thread's proxy object is in that queue, nothing will wake it up until some other thread notifies the condition variable. That notification also is a system call. The OS handles it (simplest case) by moving all of the threads that were in the condition variable's queue into the global run queue. The run queue holds all of the threads that are waiting for a CPU to run on.
On some future timer tick, the OS will pick the formerly waiting thread from the run queue and set it up on a CPU.
Extra credit:
Bad News! the first thing the thread does after being awakened, while it's still inside the condition.await() call, is it tries to re-lock the mutex. But there's a chance that the thread that signalled the condition still has the mutex locked. Our victim is going to go right back to sleep again, this time, waiting in the queue for the mutex.
A more sophisticated system might be able to optimize the situation by moving the thread directly from the condition variable's queue to the mutex queue without ever needing to wake it up and then put it back to sleep.
yes, on the lowest, hardware level instructions like Compare-and-set, Compare-and-swap are used, which spin until the condition is met, and only then make set (assignment). This spin is required each time we put a thread in a queue, be it queue to a mutex, to condition or to processor.
Then, isn't there still some spinning under the hood? Is the average amount of spinning somehow reduced?
That's a decision for the implementation to make. If spinning works best on the platform, then spinning can be used. But almost no spinning is required.
Typically, there's a lock somewhere at the lowest level of the implementation that protects system state. That lock is only held by any thread for a tiny split second as it manipulates that system state. Typically, you do need to spin while waiting for that inner lock.
A block on a mutex might look like this:
Atomically try to acquire the mutex.
If that succeeds, stop, you are done. (This is the "fast path".)
Acquire the inner lock that no thread holds for more than a few instructions.
Mark yourself as waiting for that mutex to be acquired.
Atomically release the inner lock and set your thread as not ready-to-run.
Notice the only place that there is any spinning in here is in step 3. That's not in the fast path. No spinning is needed after the call in step 5 does not return to this thread until the lock is conveyed to this thread by the thread that held it.
When a thread releases the lock, it checks the count of threads waiting for the lock. If that's greater than zero, instead of releasing the lock, it acquires the inner lock protecting system state, picks one of the threads recorded as waiting for the lock, conveys the lock to that thread, and tells the scheduler to run that thread. That thread then sees step 5 return from its call with now holding the lock.
Again, the only waiting is on that inner lock that is used just to track what thread is waiting for what.

Java Thread Live Lock

I have an interesting problem related to Java thread live lock. Here it goes.
There are four global locks - L1,L2,L3,L4
There are four threads - T1, T2, T3, T4
T1 requires locks L1,L2,L3
T2 requires locks L2
T3 required locks L3,L4
T4 requires locks L1,L2
So, the pattern of the problem is - Any of the threads can run and acquire the locks in any order. If any of the thread detects that a lock which it needs is not available, it release all other locks it had previously acquired waits for a fixed time before retrying again. The cycle repeats giving rise to a live lock condition.
So, to solve this problem, I have two solutions in mind
1) Let each thread wait for a random period of time before retrying.
OR,
2) Let each thread acquire all the locks in a particular order ( even if a thread does not require all the
locks)
I am not convinced that these are the only two options available to me. Please advise.
Have all the threads enter a single mutex-protected state-machine whenever they require and release their set of locks. The threads should expose methods that return the set of locks they require to continue and also to signal/wait for a private semaphore signal. The SM should contain a bool for each lock and a 'Waiting' queue/array/vector/list/whatever container to store waiting threads.
If a thread enters the SM mutex to get locks and can immediately get its lock set, it can reset its bool set, exit the mutex and continue on.
If a thread enters the SM mutex and cannot immediately get its lock set, it should add itself to 'Waiting', exit the mutex and wait on its private semaphore.
If a thread enters the SM mutex to release its locks, it sets the lock bools to 'return' its locks and iterates 'Waiting' in an attempt to find a thread that can now run with the set of locks available. If it finds one, it resets the bools appropriately, removes the thread it found from 'Waiting' and signals the 'found' thread semaphore. It then exits the mutex.
You can twiddle with the algorithm that you use to match up the available set lock bools with waiting threads as you wish. Maybe you should release the thread that requires the largest set of matches, or perhaps you would like to 'rotate' the 'Waiting' container elements to reduce starvation. Up to you.
A solution like this requires no polling, (with its performance-sapping CPU use and latency), and no continual aquire/release of multiple locks.
It's much easier to develop such a scheme with an OO design. The methods/member functions to signal/wait the semaphore and return the set of locks needed can usually be stuffed somewhere in the thread class inheritance chain.
Unless there is a good reason (performance wise) not to do so,
I would unify all locks to one lock object.
This is similar to solution 2 you suggested, only more simple in my opinion.
And by the way, not only is this solution more simple and less bug proned,
The performance might be better than solution 1 you suggested.
Personally, I have never heard of Option 1, but I am by no means an expert on multithreading. After thinking about it, it sounds like it will work fine.
However, the standard way to deal with threads and resource locking is somewhat related to Option 2. To prevent deadlocks, resources need to always be acquired in the same order. For example, if you always lock the resources in the same order, you won't have any issues.
Go with 2a) Let each thread acquire all of the locks that it needs (NOT all of the locks) in a particular order; if a thread encounters a lock that isn't available then it releases all of its locks
As long as threads acquire their locks in the same order you can't have deadlock; however, you can still have starvation (a thread might run into a situation where it keeps releasing all of its locks without making forward progress). To ensure that progress is made you can assign priorities to threads (0 = lowest priority, MAX_INT = highest priority) - increase a thread's priority when it has to release its locks, and reduce it to 0 when it acquires all of its locks. Put your waiting threads in a queue, and don't start a lower-priority thread if it needs the same resources as a higher-priority thread - this way you guarantee that the higher-priority threads will eventually acquire all of their locks. Don't implement this thread queue unless you're actually having problems with thread starvation, though, because it's probably less efficient than just letting all of your threads run at once.
You can also simplify things by implementing omer schleifer's condense-all-locks-to-one solution; however, unless threads other than the four you've mentioned are contending for these resources (in which case you'll still need to lock the resources from the external threads), you can more efficiently implement this by removing all locks and putting your threads in a circular queue (so your threads just keep running in the same order).

Do mutexes block ALL threads at any point?

In Linux, say I have code with 100 threads. 5 of those threads compete over a shared resource protected by a mutex. I know that when the critical section is actually being run, only the 5 threads are subject to having their execution stopped if they try to obtain the lock and the other 95 threads will be running without issues.
My question is is there any point at which those other 95 threads' execution will be paused or affected, ie when the mutex/kernel/whatever is determining which threads are blocked on the mutex and which thread should get the lock, and which threads should be able to run because their not asking for the lock, etc
No, other threads are not affected.
The kernel doesn't ask which threads are affected by the lock. Each thread tells the kernel when it tries to acquire the lock.
When threads do that, they go to sleep and get into a special wake-up queue associated with the lock.
Threads that don't use the lock won't get into the same queue as those that do, so their blocking behavior is unrelated.

What happens to a thread when an up is done on its mutex?

Mutexes are used to protect critical sections. Let's say a down has been already done on a mutex, and while the thread that did that is in the CS, 10 other threads are right behind it and do a down on the mutex, putting themselves to sleep. When the first thread exits the critical section and does an up on the mutex, do all 10 threads wake up and just resume what they were about to do, namely, entering the critical section? Wouldn't that mean then that all 10 might end up in the critical section at the same time?
No, only one thread will wake up and take ownership of the mutex. The rest of them will remain asleep. Which thread is the one that wakes up is usually nondeterministic.
The above is a generalisation and the details of implementation will be different in each system. For example, in Java compare Object#notify() and Object#notifyAll().

Mutex lock: what does "blocking" mean?

I've been reading up on multithreading and shared resources access and one of the many (for me) new concepts is the mutex lock. What I can't seem to find out is what is actually happening to the thread that finds a "critical section" is locked. It says in many places that the thread gets "blocked", but what does that mean? Is it suspended, and will it resume when the lock is lifted? Or will it try again in the next iteration of the "run loop"?
The reason I ask, is because I want to have system supplied events (mouse, keyboard, etc.), which (apparantly) are delivered on the main thread, to be handled in a very specific part in the run loop of my secondary thread. So whatever event is delivered, I queue in my own datastructure. Obviously, the datastructure needs a mutex lock because it's being modified by both threads. The missing puzzle-piece is: what happens when an event gets delivered in a function on the main thread, I want to queue it, but the queue is locked? Will the main thread be suspended, or will it just jump over the locked section and go out of scope (losing the event)?
Blocked means execution gets stuck there; generally, the thread is put to sleep by the system and yields the processor to another thread. When a thread is blocked trying to acquire a mutex, execution resumes when the mutex is released, though the thread might block again if another thread grabs the mutex before it can.
There is generally a try-lock operation that grab the mutex if possible, and if not, will return an error. But you are eventually going to have to move the current event into that queue. Also, if you delay moving the events to the thread where they are handled, the application will become unresponsive regardless.
A queue is actually one case where you can get away with not using a mutex. For example, Mac OS X (and possibly also iOS) provides the OSAtomicEnqueue() and OSAtomicDequeue() functions (see man atomic or <libkern/OSAtomic.h>) that exploit processor-specific atomic operations to avoid using a lock.
But, why not just process the events on the main thread as part of the main run loop?
The simplest way to think of it is that the blocked thread is put in a wait ("sleeping") state until the mutex is released by the thread holding it. At that point the operating system will "wake up" one of the threads waiting on the mutex and let it acquire it and continue. It's as if the OS simply puts the blocked thread on a shelf until it has the thing it needs to continue. Until the OS takes the thread off the shelf, it's not doing anything. The exact implementation -- which thread gets to go next, whether they all get woken up or they're queued -- will depend on your OS and what language/framework you are using.
Too late to answer but I may facilitate the understanding. I am talking more from implementation perspective rather than theoretical texts.
The word "blocking" is kind of technical homonym. People may use it for sleeping or mere waiting. The term has to be understood in context of usage.
Blocking means Waiting - Assume on an SMP system a thread B wants to acquire a spinlock held by some other thread A. One of the mechanisms is to disable preemption and keep spinning on the processor unless B gets it. Another mechanism probably, an efficient one, is to allow other threads to use processor, in case B does not gets it in easy attempts. Therefore we schedule out thread B (as preemption is enabled) and give processor to some other thread C. In this case thread B just waits in the scheduler's queue and comes back with its turn. Understand that B is not sleeping just waiting rather passively instead of busy-wait and burning processor cycles. On BSD and Solaris systems there are data-structures like turnstiles to implement this situation.
Blocking means Sleeping - If the thread B had instead made system call like read() waiting data from network socket, it cannot proceed until it gets it. Therefore, some texts casually use term blocking as "... blocked for I/O" or "... in blocking system call". Actually, thread B is rather sleeping. There are specific data-structures known as sleep queues - much like luxury waiting rooms on air-ports :-). The thread will be woken up when OS detects availability of data, much like an attendant of the waiting room.
Blocking means just that. It is blocked. It will not proceed until able. You don't say which language you're using, but most languages/libraries have lock objects where you can "attempt" to take the lock and then carry on and do something different depending on whether you succeeded or not.
But in, for example, Java synchronized blocks, your thread will stall until it is able to acquire the monitor (mutex, lock). The java.util.concurrent.locks.Lock interface describes lock objects which have more flexibility in terms of lock acquisition.

Resources