What are the actions a system can take when a deadlock is detected? - programming-languages

I'm having a little bit of trouble understanding how to handle deadlocks. First of all, what are some of the actions one can take? Additionally, what action is usually taken, and which is "best"? Thank you.

Well, you can't always detect deadlocks in the first place due to the Halting Problem.
But assuming you have reasonable suspicion that is has occurred, then you don't have much choice. You can:
Interrupt (i.e. send a signal/exception to) all the threads holding the lock. They will have to be able to handle the resulting interrupt, though.
Kill all the threads/processes involved. This is a drastic action, and it saves the rest of the system at the expense of the risk that some data will probably be lost by the program.

You are asking how to handle deadlocks. This is not the right question: You should avoid them. Make sure they don't happen because, realistically, your program cannot recover from them.

You can kill some of the deadlocked tasks, and hope the others can then proceed, and will not remain in, or immediately fall back into, deadlock. This is not particularly reliable.
You can kill all the deadlocked tasks. That will free up resources that would otherwise never be used without outside intervention. However, your tasks are now dead -- and if you start them all up again, there's no reason why they can't deadlock again.
As #usr says, the right thing to do is to avoid deadlocks in the first place. Any potential deadlock indicates a serious flaw in your system, and should probably cause you to rethink your design.

Temporarily prevent resources from deadlocked processes.
Back off a process to some check point allowing preemption of a needed resource and restarting the process at the checkpoint later.
Successively kill processes until the system is deadlock free.

Related

Ways to detect deadlock in a live application

What are the ways to detect deadlocks in a live multi-threaded application?
If we found there is a deadlock, are there any ways to resolve it, without taking down/restarting the application?
There are two popular ways to detect deadlocks.
One is to have threads set checkpoints. For example, if you have a thread that has a work loop, you set a timer at the beginning of doing work that's set for longer than you think the work could possibly take. If the timer fires, you assume the thread is deadlocked. When the work is done, you cancel the timer.
Another (sometimes used in combination) is to have things that a thread might block on track what other resources a thread might hold. This can directly detect an attempt to acquire one lock while holding another one when other threads have acquired those locks in the opposite order.
This can even detect deadlock risk without the deadlock actually occurring. If one thread acquires lock A then B and another acquires lock B then A, there is no deadlock unless they overlap. But this method can detect it.
Advanced deadlock detection is typically only used during debugging. Other than coding the application to check each blocking lock for a possible deadlock and knowing what to do if it happens, the only thing you can do after a deadlock is tear the application down. You can't release locks blindly because the resources they protect may be in an inconsistent state.
Sometimes you deliberately write code that you know can deadlock and specifically code it to avoid the problem. For example, if you know lots of threads take lock A and then try to acquire lock B, and some other thread needs to do the reverse, you can code it do a non-blocking attempt to lock B and release lock A if it fails.
Typically, it's more useful to spend your effort making deadlocks impossible rather than making the code detect and work around deadlocks.
Python has a feature called the faulthandler that's very useful for dealing with deadlocks:
import faulthandler
faulthandler.register(signal.SIGUSR1)
If you're using C++ or any compiler that uses glibc, you can use the backtrace() functions in execinfo.h to print a stacktrace and exit gracefully when you get a signal. You can take a deadlocked program, send it a signal and get a list of all the threads.
In Java, use jstack <pid> on the stuck process.

Benefits of a suspended state in threads?

I'm tyring to answer a question, a professor purposed us.
Threads usually have states Running, Ready, and Blocked. Suppose we wanted to add a Suspended state to maximize processor utilization through admitting a larger number of threads requiring more memory than available in the process' address space.Does the above make sense? If it does, explain why and explain what benet(s) we obtain. If it doesnot, explain why not.
The suspended state seems pretty stupid to me because synchronization would just be a terrible experience. In any case where you might want to suspend, going into a blocked state is probably a 10x better idea because of this. And on top of that isn't processor already utilized as best as it can be because when one thread gets blocked, another gets scheduled. By putting in a suspend state that you explicitly go into, you are pretty much manually controlling the scheduling. I'm really confused as to what benefits it would provide. Any ideas?
I completely agree with you, synchronization is not possible until you are not limited to threads start point synchronization. where you just create a thread in suspended state, and allow it to continue only when parent process raises a flag. But apart of this, synchronization is not possible with suspended thread model.
I also think blocking is a better than hanging up thread in suspended queue. Processor is already fully utilized, and It is not really beneficial anyway to put thread in suspended state until you are using it for some special purpose. Debuggers use suspended thread state, so that they can alter/break/trace thread's state. This shows exactly how we can use suspended state.
You are right, You are somewhat manually controlling thread's scheduling process and that makes It a terrible idea.

pthread_rwlock across processes: Repair after crash?

I'm working on linux and I'm using a pthread_rwlock, which is stored in shared memory and shared over multiple processes. This mostly works fine, but when I kill a process (SIGKILL) while it is holding a lock, it appears that the lock is still held (regardless of whether it's a read- or write-lock).
Is there any way to recognize such a state, and possibly even repair it?
The real answer is to find a decent way to stop a process. Killing it with SIGKILL is not a decent way to do it.
This feature is specified for mutexes, called robustness (PTHREAD_MUTEX_ROBUST) but not for rwlocks. The standard doesn't provide it and kernel.org doesn't even have a page on rwlocks. So, like I said:
Find another way to stop the process (perhaps another signal that can be handled ?)
Release the lock when you exit
#cnicutar - that "real answer" is pretty dubious. It's the kernel's job to handle cross process responsibilities of freeing of resources and making sure things are marked consistent - userspace can't effectively do the job when stuff goes wrong.
Granted if everybody plays nice the robust features will not be needed but for a robust system you want to make sure the system doesn't go down from some buggy client process.

How to detect a hung thread?

Is it possible to detect a hung thread? This thread is not part of any thread pool, its just a system thread. Since thread is hung, it may not process any events.
Thanks,
In theory, it is impossible. If you are on Windows and suspect that the thread might be deadlocked, I guess you could use GetThreadContext a few times and check if it is always the same, but I don't know how reliable it will be.
Not in theory, but in practice it may be possible, depending on your workload. For example if it is supposed to respond to events, you could post a thread message (in windows) and see if it responds. You could set an event or flag that would cause it to do something - you then have to wait for a "reasonable" amount of time to see if it has responded. The question then arises what you would do with the "hung" thread, even if it has really hung and isn't just taking a long time to respond. The thread cannot generally safely be killed and you cannot generally interrupt an arbitrary thread. It is safe enough to log a message to the effect, but who will care? Probably the best thing to do is to note it and figure out the bug that is causing it to hang.
Depending on the workload and the kinds of processing done and other details, it may be possible to detect a hung thread. In some cases, modern VMs can detect a lock deadlock where two threads are hung waiting for the other to release a lock. (But don't rely on this, because it isn't always possible, only sometimes.)
We need a lot more information before we can give a specific answer to your question.

Which is the better method? Allowing the thread to sleep for a while or deleting it and recreating it later?

We have a process that needs to run every two hours. It's a process that needs to run on it's own thread so as to not interrupt normal processing.
When it runs, it will download 100k records and verify them against a database. The framework to run this has a lot of objects managing this process. These objects only need to be around when the process is running.
What's a better standard?
Keep the thread in wait mode by letting it sleep until I need it again. Or,
Delete it when it is done and create it the next time I need it? (System Timer Events.)
There is not that much difference between the two solutions. I tend to prefer the one where the thread is created each time.
Having a thread lying around consumes resources (memory at least). In a garbage collected language, it may be easy to have some object retained in this thread, thus using even more memory. If you have not the thread laying around, all resources are freed and made available for two hours to the main process.
When you want to stop your whole process, where your thread may be executing or not, you need to interrupt the thread cleanly. It is always difficult to interrupt a thread or knowing if it is sleeping or working. You may have some race conditions there. Having the thread started on demand relieves you from those potential problems: you know if you started the thread and in that case calling thread_join makes you wait until the thread is done.
For those reasons, I would go for the thread on demand solution, even though the other one has no insurmontable problems.
Starting one thread every two hours is very cheap, so I would go with that.
However, if there is a chance that at some time in the future the processing could take more than the run interval, you probably want to keep the thread alive. That way, you won't be creating a second thread that will start processing the records while the first is still running, possibly corrupting data or processing records twice.
Either should be fine but I would lean towards keeping the thread around for cases where the verification takes longer than expected (ex: slow network links or slow database response).
How would you remember to start a new thread when the two hours are up ? With a timer? (That's on another thread!) with another thread that sleeps until the specified time? Shutting down the thread and restarting it based on something running somewhere else does you no good if the something else is either on it's own separate thread, or blocks the main app while it's waiting to "Create" the worker thread when the two hours are up, no?
Just let the Thread sleep...
I agree with Vilx that it's mostly a matter of taste. There is processing and memory overhead of both methods, but probably not enough for either to matter.
If you are using Java you could check Timer class. It allows you to schedule tasks on given time.
Also, if you need more control you can use quartz library.
I guess actually putting the thread to sleep is most effective, ending it and recreating it would "cost" some resources, while putting it to sleep would just fill a little space in the sceduler while it's data could be paged by the operationg system if needed.
But anyway it's probably not a very big difference, and the difference would probably depend on how good the OS' sceduler is, etc...
It really depends on one thing as I can tell... state.
If the thread creates a lot of state (allocates memory) that is useful to have during the next iteration of the thread run, then I would keep it around. That way, your process can potentially optimize its run by only performing certain operations if certain things changed since the last running.
However, if the state that the process creates is significant compared with the amount of work to be done, and you are short on resources on the machine, then it may not be worth the cost of keeping the state around in between exectutions. If thats the case, then you should recreate the thread from scratch each time.
I think it's just a matter of taste. Both are good. Use the one which you find easier to implement. :)
I would create the thread a single time, and use events/condition variables to let it sleep until signaled to wake up again. That way if the amount of time needed ever has to change, you only need change the timing in firing the event and your code will still be pretty clean.
I wouldn't think it's very important, but the best approach is very platform dependent.
A .NET System.Threading.Timer costs nothing while it's waiting, and will invoke your code on a pool thread. In theory, that would be the best of both your suggestions.
Another important thing to consider if you are on a garbage collected system like Java is that anything strongly referenced by a sleeping thread is not garbage. In that respect, it's better to kill idle threads, and let them, and any objects they reference, get cleaned up.
It all depends, of course. But by default I would go with a separate process (not thread) started on demand.

Resources