How to detect and find out a program is in deadlock? - linux

This is an interview question.
How to detect and find out if a program is in deadlock? Are there some tools that can be used to do that on Linux/Unix systems?
My idea:
If a program makes no progress and its status is running, it is deadlock. But, other reasons can also cause this problem. Open source tools are valgrind (halgrind) can do that. Right?

If you suspect a deadlock, do a ps aux | grep <exe name>, if in output, the PROCESS STATE CODE is D (Uninterruptible sleep) means it is a deadlock.
Because as #daijo explained, say you have two threads T1 & T2 and two critical sections each protected by semaphores S1 & S2 then if T1 acquires S1 and T2 acquires S2 and after that they try to acquire the other lock before relinquishing the one already held by them, this will lead to a deadlock and on doing a ps aux | grep <exe name>, the process state code will be D (ie Uninterruptible sleep).
Tools:
Valgrind, Lockdep (linux kernel utility)
Check this link on types of deadlocks and how to avoid them :
http://cmdlinelinux.blogspot.com/2014/01/linux-kernel-deadlocks-and-how-to-avoid.html
Edit: ps aux output D "could" mean process is in deadlock, from this redhat doc:
Uninterruptible Sleep State
An Uninterruptible sleep state is one
that won't handle a signal right away. It will wake only as a result
of a waited-upon resource becoming available or after a time-out
occurs during that wait (if the time-out is specified when the process
is put to sleep).

I would suggest you look at Helgrind: a thread error detector.
The simplest example of such a problem is as follows.
Imagine some shared resource R, which, for whatever reason, is guarded by two locks, L1 and L2, which must both be held when R is accessed.
Suppose a thread acquires L1, then L2, and proceeds to access R. The implication of this is that all threads in the program must acquire the two locks in the order first L1 then L2. Not doing so risks deadlock.
The deadlock could happen if two threads -- call them T1 and T2 -- both want to access R. Suppose T1 acquires L1 first, and T2 acquires L2 first. Then T1 tries to acquire L2, and T2 tries to acquire L1, but those locks are both already held. So T1 and T2 become deadlocked."

Related

Understanding Condition Variables - Isn't there always spin-waiting "somewhere"?

I am going to explain my understanding of this OS construct and appreciate some polite correction.
I understand thread-safety clearly and simply.
If there is some setup where
X: some condition
Y: do something
and
if X
do Y
is atomic, meaning that if at the exact moment in time
doing Y
not X
there is some problem.
By my understanding, the lowest-level solution of this is to use shared objects (mutexes). As an example, in the solution to the "Too Much Milk" Problem
Thead A | Thread B
-------------------------------------
leave Note A | leave Note B
while Note B | if no Note A
do nothing | if no milk
if no milk | buy milk
buy milk | remove Note B
remove Note A |
Note A and Note B would be the shared objects, i.e. some piece of memory accessible by both threads A and B.
This is can be generalized (beyond milk) for 2-thread case like
Thead A | Thread B
-------------------------------------
leave Note A | leave Note B
while Note B | if no Note A
do nothing | if X
if X | do Y
do Y | remove Note B
remove Note A |
and there is some way to generalize it for the N-thread case (so I'll continue referring to the 2-thread case for simplicity).
Possibly incorrect assumption #1: This is the lowest-level solution known (possible?).
Now one of the defficiencies of this solution is the spinning or busy-wait
while Note B
do nothing
because if the do Y is an expensive task then the thread scheduler will keep switching to Thread A to perform this check, i.e. the thread is still "awake" and using processing power even when we "know" its processing is to perform a check that will fail for some time.
The question then becomes: Is there some way we could make Thread A "sleep", so that it isn't scheduled to run until Note B is gone, and then "wake up"???
The Condition Variable design pattern provides a solution and it built on top of mutexes.
Possibly incorrect assumption #2: Then, isn't there still some spinning under the hood? Is the average amount of spinning somehow reduced?
I could use a logical explanation like only S.O. can provide ;)
Isn't there still some spinning under the hood.
No. That's the whole point of condition variables: It's to avoid the need for spinning.
An operating system scheduler creates a private object to represent each thread, and it keeps these objects in containers which, for purpose of this discussion, we will call queues.
Simplistic explanation:
When a thread calls condition.await(), that invokes a system call. The scheduler handles it by removing the calling thread from whatever CPU it was running on, and by putting its proxy object into a queue. Specifically, it puts it into the queue of threads that are waiting to be notified about that particular condition.
There usually is a separate queue for every different thing that a thread could wait for. If you create a mutex, the OS creates a queue of threads that are waiting to acquire the mutex. If you create a condition variable, the OS creates a queue of threads that are waiting to be notified.
Once the thread's proxy object is in that queue, nothing will wake it up until some other thread notifies the condition variable. That notification also is a system call. The OS handles it (simplest case) by moving all of the threads that were in the condition variable's queue into the global run queue. The run queue holds all of the threads that are waiting for a CPU to run on.
On some future timer tick, the OS will pick the formerly waiting thread from the run queue and set it up on a CPU.
Extra credit:
Bad News! the first thing the thread does after being awakened, while it's still inside the condition.await() call, is it tries to re-lock the mutex. But there's a chance that the thread that signalled the condition still has the mutex locked. Our victim is going to go right back to sleep again, this time, waiting in the queue for the mutex.
A more sophisticated system might be able to optimize the situation by moving the thread directly from the condition variable's queue to the mutex queue without ever needing to wake it up and then put it back to sleep.
yes, on the lowest, hardware level instructions like Compare-and-set, Compare-and-swap are used, which spin until the condition is met, and only then make set (assignment). This spin is required each time we put a thread in a queue, be it queue to a mutex, to condition or to processor.
Then, isn't there still some spinning under the hood? Is the average amount of spinning somehow reduced?
That's a decision for the implementation to make. If spinning works best on the platform, then spinning can be used. But almost no spinning is required.
Typically, there's a lock somewhere at the lowest level of the implementation that protects system state. That lock is only held by any thread for a tiny split second as it manipulates that system state. Typically, you do need to spin while waiting for that inner lock.
A block on a mutex might look like this:
Atomically try to acquire the mutex.
If that succeeds, stop, you are done. (This is the "fast path".)
Acquire the inner lock that no thread holds for more than a few instructions.
Mark yourself as waiting for that mutex to be acquired.
Atomically release the inner lock and set your thread as not ready-to-run.
Notice the only place that there is any spinning in here is in step 3. That's not in the fast path. No spinning is needed after the call in step 5 does not return to this thread until the lock is conveyed to this thread by the thread that held it.
When a thread releases the lock, it checks the count of threads waiting for the lock. If that's greater than zero, instead of releasing the lock, it acquires the inner lock protecting system state, picks one of the threads recorded as waiting for the lock, conveys the lock to that thread, and tells the scheduler to run that thread. That thread then sees step 5 return from its call with now holding the lock.
Again, the only waiting is on that inner lock that is used just to track what thread is waiting for what.

spin lock acquiring in linux

I was just wondering, suppose PC is having multi cores. There are three threads running in three different cores. Thread(T1) has acquired spin lock(S) in core(C1) and acquired lock by T1, same time T2 and T3 threads running in core C2 and C3 try to acquire lock and waiting for release of lock. once T1 thread releases lock which thread will acquire lock either T2 or T3? I am considering same priority of T2 and T3,and also waiting in different cores same time.
The linux kernel uses MCS spin locks. The gist is that waiters end up adding themselves to a queue. However, if there are 2 threads doing this, there are no guarantees as to who is going to succeed first.
Similarly for more simplistic spin locks where the code just tries to flip the "taken" bit, there are no guarantees whatsoever. However, certain hardware characteristics can make it so certain cores have an easier time than others (if they share the same socket).
You want to read https://www.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html
I repeat: if 2 different threads compete for a lock, there is no guaranteed order in which they will take it and looking for one is wrong in the first place.

Is this an example of a livelock or deadlock or starvation?

Scheduling Scheme : Preemptive Priority Scheduling
Situation :
A process L (Low priority) acquires a spinlock on a resource (R). While still in the Critical Section, L gets preempted because of the arrival of another process - H (Higher Priority) into the ready queue. .
However, H also needs to access resource R and so tries to acquire a spin lock which results in H going to busy wait. Because spinlocks are used, H never actually goes into Wait and will always be in Running state or Ready state (in case an even higher priority process arrives ready queue), preventing L or any process with a priority lower than H from ever executing.
A) All processes with priority less than H can be considered to be under Starvation
B) All processes with priority less than H as well as the process H, can be considered to be in a deadlock. [But, then don't the processes have to be in Wait state for the system to be considered to be in a deadlock?]
C) All processes with priority less than H as well as the process H, can be considered to be in a livelock.[But, then only the state of H changes constantly, all the low priority process remain in just the Ready state. Don't the state of all processes need to change (as part of a spin lock) continuously if the system in livelock?]
D) H alone can be considered to be in livelock, all lower priority processes are just under starvation, not in livelock.
E) H will not progress, but cannot be considered to be in livelock. All lower priority processes are just under starvation, not in livelock.
Which of the above statements are correct? Can you explain?
This is not a livelock, because definition of livelock requires that "states of the processes involved in the livelock constantly change with regard to one another", and here states effectively do not change.
The first process can be considered to be in processor starvation, because if there were additional processor, it could run on it and eventually release the lock and let the second processor to run.
The situation also can be considered as a deadlock, with 2 resources in the resource graph, and 2 processes attempting to acquire that resources in opposite directions: first process owns the lock and needs the processor to proceed, while the second process owns the processor and needs the lock to proceed.

Semaphore minimization

I stumbled upon a problem in a multi-threading book in the library and I was wondering what I would need to do in order to minimize the number of semaphores.
What would you do in this situation?
Semaphores
Assume a process P0's execution depends on other k processes: P1,...,Pk.
You need one semaphore to synchronize the processes and to satisfy this single constrain.
The semaphore S0 is initialized with 0, while P0 will try to wait k times on S0 (in other word, it will try to acquire k resources).
Each of k processes P1, ..., Pk will release S0 upon their ends of executions.
This will guarantee that P0 will start execution only after all the other k processes complete their execution (in any order and asynchronously).
In the link you provided, you need 4 semaphores, T1 does not need any semaphore because its execution depends on nobody else.

Regarding Mutexes and Semaphors

Suppose there are 4 Threads (T1 to T4) that need to run concurrently and 3 structs (Struct1 to Struct3) as resources
T1 to T2 share struct1 (by T1 writing to struct1 and T2 reading from it)
T2 to T3 share struct2 (by T2 writing to struct2 and T3 reading from it)
T3 to T4 share struct3 (by T3 writing to struct3 and T4 reading from it)
Because of this statement from § 41.2.4 of The C++ Programming Language (4th edition) by Bjarne Stroustrup :
"Two threads have a data race if both can access a memory location
simultaneously and at least one of their accesses is a write. Note
that defining “simultaneously” precisely is not trivial. If two
threads have a data race, no language guarantees hold: the behavior is
undefined."
It becomes clear there is a need for syncrhonization.
1 - Which of these primitives are suitable to this application , just mutices or Semaphores ?
2- If mutex is the choice, we would need 3 mutices, one mutex for each structure , right ?
3- Would the fact of using a mutex at a given non-atomic operation, block CPU time of other threads ?
Your usecase is kind of abstract so batter solutions might be available. But based just on the information you provided:
1) Use mutex. I do not see how semaphores could help except to be used as mutex. A semaphore could be usefull when you share more resources, but in your case it is only one at a time.
If all four threads would access the first free struct or if your struct would be an queue, a semaphore could help.
2) Right, one mutex per structure.
3) Yes, it could, this is the idea, you do not want for T1 to write when T2 is reading struct1 and viceversa. Worstcase could be T1 blocks T2 that has already blocked T3 that has blocked T4.
1 - 3 semaphore for each queue, see Producer–consumer problem.
2- 1 of the semaphores could be a mutex, binary semaphores are much like mutex.
3- if you have to wait for a semaphore or mutex you will be placed in the no ready queue of the OS, waiting for the release. And so doesn't use any CPU (except for the 1000's of cycles it cost for the context switch).

Resources