A deadlock normally means that thread (or process) A is waiting for thread B, and at the same time thread B is waiting for thread A.
Currently I encountered a similar situation in our application. Thread A is waiting for an event to be set by thread B. However, thread B is not waiting for thread A, it just won't set the event (no matter for what reason). I am wondering whether this situation can also be called a "deadlock", or is there an other term for this?
I'd call it a bug or bad design. But it is not deadlock if one thread is still running.
Strictly speaking, no that's not deadlock, which is what you initially said (except that in general there could be a whole cycle of threads each waiting for the next one's lock: A->B->...->Z->A).
I think you could call it resource starvation, but that's quite a general term that also covers deadlock.
I would call it a starvation (ressource being CPU), not a deadlock.
Yes - I would call this a deadlock, too.
However, only one thread (Thread A) is affected from it, not the entire application.
Here is my point of view :
A deadlock is a situation where the global state of the program does not progress anymore.
If A is blocked but the program can still terminate because B may find a solution, it is not a deadlock.
Related
Threads and parallel programming is really confusing the heck outta me. In this book, at page 9, the problem stated is that though a thread might be scheduled and put in the ready state, it does not necessarily mean that it has acquird a lock.
Briefly put, a thread (say t1) waiting on a lock is notified via a condition_variable and the thread is put in the ready state, but not executed. But just before it can run anything, another thread is scheduled (say t2) and executed. This means that the condition under which t1 assumes it is woken up no longer holds.
Does this imply that merely scheduling a thread or putting it the ready state does not mean that it acquired a lock? If this is the case, must I always put the precondition in a while loop? Is this another possible meaning of a spurious wakeup? Also, what other cases like this must I be aware of?
I was always under the assumption that if a thread is woken up from a wait (which is not a spurious wakeup), it immediately acquires the lock (wakeup = lock acquired, under this circumstance), as the kernel keeps track of this.
This question is in close relation to my other question posted here.
Thanks.
Where can I ask these noob questions, in sort of an interactive format with follow-up questions? These seem too dumb for stackoverflow.
must I always put the condition in a while loop?
It's good practice to do so. Even if you know that on some particular hardware platform and OS, it's impossible for the wait() to return unless the condition is true; it could behave differently after the OS has been updated, or it could behave differently if your code gets moved to a different platform, or it could behave differently after some change is made to your code.
If you ever work developing "enterprise" software, then changes like that can and will happen. Might as well start learning good habits that can help to avert future disasters.
I was always under the assumption that if a thread is woken up from a wait (which is not a spurious wakeup), it immediately acquires the lock
You can safely assume that wait() will not, under any circumstances, ever return until the mutex has been re-locked. The whole wait()/notify() paradigm depends on it behaving in that way.
What are the ways to detect deadlocks in a live multi-threaded application?
If we found there is a deadlock, are there any ways to resolve it, without taking down/restarting the application?
There are two popular ways to detect deadlocks.
One is to have threads set checkpoints. For example, if you have a thread that has a work loop, you set a timer at the beginning of doing work that's set for longer than you think the work could possibly take. If the timer fires, you assume the thread is deadlocked. When the work is done, you cancel the timer.
Another (sometimes used in combination) is to have things that a thread might block on track what other resources a thread might hold. This can directly detect an attempt to acquire one lock while holding another one when other threads have acquired those locks in the opposite order.
This can even detect deadlock risk without the deadlock actually occurring. If one thread acquires lock A then B and another acquires lock B then A, there is no deadlock unless they overlap. But this method can detect it.
Advanced deadlock detection is typically only used during debugging. Other than coding the application to check each blocking lock for a possible deadlock and knowing what to do if it happens, the only thing you can do after a deadlock is tear the application down. You can't release locks blindly because the resources they protect may be in an inconsistent state.
Sometimes you deliberately write code that you know can deadlock and specifically code it to avoid the problem. For example, if you know lots of threads take lock A and then try to acquire lock B, and some other thread needs to do the reverse, you can code it do a non-blocking attempt to lock B and release lock A if it fails.
Typically, it's more useful to spend your effort making deadlocks impossible rather than making the code detect and work around deadlocks.
Python has a feature called the faulthandler that's very useful for dealing with deadlocks:
import faulthandler
faulthandler.register(signal.SIGUSR1)
If you're using C++ or any compiler that uses glibc, you can use the backtrace() functions in execinfo.h to print a stacktrace and exit gracefully when you get a signal. You can take a deadlocked program, send it a signal and get a list of all the threads.
In Java, use jstack <pid> on the stuck process.
Process has some 10 threads and all 10 threads entered DEADLOCK state( assume all are waiting for Mutex variable ).
How can you free process(threads) from DEADLOCK state ? .
Is there any way to kill lower priority thread ?( in Multi process case we can kill lower priority process when all processes in deadlock state).
Can we attach that deadlocked process to the debugger and assign proper value to the Mutex variable ( assume all the threads are waiting on a mutex variable MUT but it is value is 0 and can we assign MUT value to 1 through debugger ) .
If every thread in the app is waiting on every other, and none are set to time out, you're rather screwed. You might be able to run the app in a debugger or something, but locks are generally acquired for a reason -- and manually forcing a mutex to be owned by a thread that didn't legitimately acquire it can cause some big problems (the thread that previously owned it is still going to try and release it, the results of which can be unpredictable if the mutex is unexpectedly yanked away. Could cause an unexpected exception, could cause the mutex to be unlocked while still in use.) Anyway it defeats the whole purpose of mutexes, so you're just covering up a much bigger problem.
There are two common solutions:
Instead of having threads wait forever, set a timeout. This is slightly harder to do in languages like Java that embed mutexes into the language via synchronized or lock blocks, but it's almost always possible. If you time out waiting on the lock, release all the locks/mutexes you had and try later.
Better, but potentially much more complex, is to figure out why everything's fighting for the resource and remove that contention. If you must lock, lock consistently. But if there's 10 threads blocking on a single mutex, that could be a clue either that your operations are badly chunked (ie: that your threads are doing too much or too little at once before trying to acquire a lock), or that there's unnecessary locking going on. Don't lock unless you have to. Some synchronization could be obviated by using collections and algorithms specifically designed to be "lock-free" while still offering thread-safety.
Adding another answer because I don't agree with the solutions proposed by cHao earlier - the analysis is fine.
First, why I disagree with the two solutions offered:
Reduce contention
Contention doesn't lead to deadlocks. It just causes poor performance. Deadlock means no performance whatsoever. Therefore, reducing contention does not solve deadlocks.
timeout on mutex.
A mutex protects a resource, and a thread locks the mutex because it needs the resource. With a timeout, you won't be able to acquire the resource, and your thread fails. Does it solve the deadlock problem? Only if the failing thread releases another resource that was blocking the other threads.
But in that case, there's a much better solution. Mutexes should have a partial ordering. If there is at least one thread that can both mutex A and B, you should decide whether A or B is acquired first, and then stick with that. This must be a transitive order: if you lock A before B, and B before C, then obviously you must lock A before C.
This is a perfect solution to deadlocks. Look back at the timeout example: it only works if the thread that times out waiting on A then releases its lock on B, to release another thread that was waiting on B. In the most simple case, that other thread was itself directly locking A. Thus, the mutexes A and B are not properly ordered. You should have consistently locked either A or B first.
The timeout case could also be the result of a cyclic order problem; one thread locks A then B, another B then C, and a third C then A, with the deadlock happening when each thread owns one lock. The solution again is the same; order the locks.
Alternatively said, mutex lock orders can be described by a directed graph. If a thread locks A before B, there's an arc from A to B. Deadlocks appear if the directed graph is cyclic, and then the arcs of that cycle are the deadlocked threads.
This theory can be a bit complex, but there are some simple insights to be found. For instance, from the graph theory, we know that trees are acyclic graphs. Hence, neither "leaf mutexes" (those that are always locked last) nor "root mutexes" (those that are always locked first) can cause deadlocks. Leaf mutexes are excluded because no thread ever blocks holding them, and root mutexes are excluded because the thread that holds them will be able to lock all subsequent mutexes in due time.
I've been reading up on multithreading and shared resources access and one of the many (for me) new concepts is the mutex lock. What I can't seem to find out is what is actually happening to the thread that finds a "critical section" is locked. It says in many places that the thread gets "blocked", but what does that mean? Is it suspended, and will it resume when the lock is lifted? Or will it try again in the next iteration of the "run loop"?
The reason I ask, is because I want to have system supplied events (mouse, keyboard, etc.), which (apparantly) are delivered on the main thread, to be handled in a very specific part in the run loop of my secondary thread. So whatever event is delivered, I queue in my own datastructure. Obviously, the datastructure needs a mutex lock because it's being modified by both threads. The missing puzzle-piece is: what happens when an event gets delivered in a function on the main thread, I want to queue it, but the queue is locked? Will the main thread be suspended, or will it just jump over the locked section and go out of scope (losing the event)?
Blocked means execution gets stuck there; generally, the thread is put to sleep by the system and yields the processor to another thread. When a thread is blocked trying to acquire a mutex, execution resumes when the mutex is released, though the thread might block again if another thread grabs the mutex before it can.
There is generally a try-lock operation that grab the mutex if possible, and if not, will return an error. But you are eventually going to have to move the current event into that queue. Also, if you delay moving the events to the thread where they are handled, the application will become unresponsive regardless.
A queue is actually one case where you can get away with not using a mutex. For example, Mac OS X (and possibly also iOS) provides the OSAtomicEnqueue() and OSAtomicDequeue() functions (see man atomic or <libkern/OSAtomic.h>) that exploit processor-specific atomic operations to avoid using a lock.
But, why not just process the events on the main thread as part of the main run loop?
The simplest way to think of it is that the blocked thread is put in a wait ("sleeping") state until the mutex is released by the thread holding it. At that point the operating system will "wake up" one of the threads waiting on the mutex and let it acquire it and continue. It's as if the OS simply puts the blocked thread on a shelf until it has the thing it needs to continue. Until the OS takes the thread off the shelf, it's not doing anything. The exact implementation -- which thread gets to go next, whether they all get woken up or they're queued -- will depend on your OS and what language/framework you are using.
Too late to answer but I may facilitate the understanding. I am talking more from implementation perspective rather than theoretical texts.
The word "blocking" is kind of technical homonym. People may use it for sleeping or mere waiting. The term has to be understood in context of usage.
Blocking means Waiting - Assume on an SMP system a thread B wants to acquire a spinlock held by some other thread A. One of the mechanisms is to disable preemption and keep spinning on the processor unless B gets it. Another mechanism probably, an efficient one, is to allow other threads to use processor, in case B does not gets it in easy attempts. Therefore we schedule out thread B (as preemption is enabled) and give processor to some other thread C. In this case thread B just waits in the scheduler's queue and comes back with its turn. Understand that B is not sleeping just waiting rather passively instead of busy-wait and burning processor cycles. On BSD and Solaris systems there are data-structures like turnstiles to implement this situation.
Blocking means Sleeping - If the thread B had instead made system call like read() waiting data from network socket, it cannot proceed until it gets it. Therefore, some texts casually use term blocking as "... blocked for I/O" or "... in blocking system call". Actually, thread B is rather sleeping. There are specific data-structures known as sleep queues - much like luxury waiting rooms on air-ports :-). The thread will be woken up when OS detects availability of data, much like an attendant of the waiting room.
Blocking means just that. It is blocked. It will not proceed until able. You don't say which language you're using, but most languages/libraries have lock objects where you can "attempt" to take the lock and then carry on and do something different depending on whether you succeeded or not.
But in, for example, Java synchronized blocks, your thread will stall until it is able to acquire the monitor (mutex, lock). The java.util.concurrent.locks.Lock interface describes lock objects which have more flexibility in terms of lock acquisition.
How do I determine if a detached pthread is still alive ?
I have a communication channel with the thread (a uni-directional queue pointing outwards from the thread) but what happens if the thread dies without a gasp?
Should I resign myself to using process signals or can I probe for thread liveliness somehow?
For a joinable (i.e NOT detached) pthread you could use pthread_kill like this:
int ret = pthread_kill(YOUR_PTHREAD_ID, 0);
If you get a ESRCH value, it might be the case that your thread is dead.
However this doesn't apply to a detached pthreads because after it has ended its thread ID can be reused for another thread.
From the comments:
The answer is wrong because if the thread is detached and is not
alive, the pthread_t is invalid. You can't pass it to pthread_kill. It
could, for example, be a pointer to a structure that was freed,
causing your program to crash. POSIX says, "A conforming
implementation is free to reuse a thread ID after its lifetime has
ended. If an application attempts to use a thread ID whose lifetime
has ended, the behavior is undefined." – Thanks #DavidSchwartz
This question assumes a design with an unavoidable race condition.
Presumably, you plan to do something like this:
Check to see if thread is alive
Wait for message from thread
The problem is that this sequence is not atomic and cannot be fixed. Specifically, what if the thread you are checking dies between step (1) and step (2)?
Race conditions are evil; rare race conditions doubly so. Papering over something 90% reliable with something 99.999% reliable is one of the worst decisions you can make.
The right answer to your question is "don't do that". Instead, fix your application so that threads do not die randomly.
If that is impossible, and some thread is prone to crashing, and you need to recover from that... Then your design is fundamentally flawed and you should not be using a thread. Put that unreliable thing in a different process and use a pipe to communicate with it instead. Process death closes file descriptors, and reading a pipe whose other end has been closed has well-defined, easily detected, race-free behavior.
It is probably undefined behaviour when you send a signal to an already dead thread. Your application might crash.
see http://sourceware.org/bugzilla/show_bug.cgi?id=4509 and http://udrepper.livejournal.com/16844.html