How to detect a hung thread? - multithreading

Is it possible to detect a hung thread? This thread is not part of any thread pool, its just a system thread. Since thread is hung, it may not process any events.
Thanks,

In theory, it is impossible. If you are on Windows and suspect that the thread might be deadlocked, I guess you could use GetThreadContext a few times and check if it is always the same, but I don't know how reliable it will be.

Not in theory, but in practice it may be possible, depending on your workload. For example if it is supposed to respond to events, you could post a thread message (in windows) and see if it responds. You could set an event or flag that would cause it to do something - you then have to wait for a "reasonable" amount of time to see if it has responded. The question then arises what you would do with the "hung" thread, even if it has really hung and isn't just taking a long time to respond. The thread cannot generally safely be killed and you cannot generally interrupt an arbitrary thread. It is safe enough to log a message to the effect, but who will care? Probably the best thing to do is to note it and figure out the bug that is causing it to hang.

Depending on the workload and the kinds of processing done and other details, it may be possible to detect a hung thread. In some cases, modern VMs can detect a lock deadlock where two threads are hung waiting for the other to release a lock. (But don't rely on this, because it isn't always possible, only sometimes.)
We need a lot more information before we can give a specific answer to your question.

Related

Benefits of a suspended state in threads?

I'm tyring to answer a question, a professor purposed us.
Threads usually have states Running, Ready, and Blocked. Suppose we wanted to add a Suspended state to maximize processor utilization through admitting a larger number of threads requiring more memory than available in the process' address space.Does the above make sense? If it does, explain why and explain what benet(s) we obtain. If it doesnot, explain why not.
The suspended state seems pretty stupid to me because synchronization would just be a terrible experience. In any case where you might want to suspend, going into a blocked state is probably a 10x better idea because of this. And on top of that isn't processor already utilized as best as it can be because when one thread gets blocked, another gets scheduled. By putting in a suspend state that you explicitly go into, you are pretty much manually controlling the scheduling. I'm really confused as to what benefits it would provide. Any ideas?
I completely agree with you, synchronization is not possible until you are not limited to threads start point synchronization. where you just create a thread in suspended state, and allow it to continue only when parent process raises a flag. But apart of this, synchronization is not possible with suspended thread model.
I also think blocking is a better than hanging up thread in suspended queue. Processor is already fully utilized, and It is not really beneficial anyway to put thread in suspended state until you are using it for some special purpose. Debuggers use suspended thread state, so that they can alter/break/trace thread's state. This shows exactly how we can use suspended state.
You are right, You are somewhat manually controlling thread's scheduling process and that makes It a terrible idea.

Necessity of gracefully ending a thread

If I am building a multithreaded application, all its threads would automatically get killed when I abort the application.
If I want a thread to have a lifetime equal to that of the main thread, do I really need to gracefully end the thread, or let the application abort take care of killing it?
Edit: As threading rules depend on the OS, I'd like to hear opinions for the following too:
Android
Linux
iOS
It depends on what the thread is doing.
When a thread is killed, it's execution stops at any point in the code, meaning some operations may not be finished, like
writing a file
sending network messages
But the OS will
close all handles the application owns
release any locks
free all memory
close any open file
etc...
So, as long as you can make sure that all your files etc. are in a consistent state, you don't have to worry about the system resources.
I know this is true for Windows, and I would be very surprised if it was different on other OSes. The time when a application that didn't release all resources could affect the entire system is long gone, fortunately.
No. With most non-trivial OS, you do not need to explicitly/gracefully terminate app-lifetime threads unless there is a specific and overriding need to do so.
Just one reason is that you cannot always actually do it with user code. User-level code cannot stop a thread that is running on another core than the thread requesting the stop. The OS can, and does.
Your linux/Windows OS is very good indeed at stopping threads in any state on an core and releasing resources like thread stacks, heaps, OS object handles/fd's etc. at process-termination. It's had millions of hours of testing on systems world-wide, something that your own user code is very unlikely to ever experience. If you can do so, you should let the OS do what it's good at.
In other posts, several cases have been made where user-level termination of a thread may be unavoidable. Inter-process comms is one area, as are DB connections/transactions. If you are forced into it by your requirements, then fine, go for it but, otherwise, don't try - it's a waste of time and effort writing/testing/debugging thread-stop code to do what the OS can do effectively on its own.
Beware of premature stoptimization.

What are the actions a system can take when a deadlock is detected?

I'm having a little bit of trouble understanding how to handle deadlocks. First of all, what are some of the actions one can take? Additionally, what action is usually taken, and which is "best"? Thank you.
Well, you can't always detect deadlocks in the first place due to the Halting Problem.
But assuming you have reasonable suspicion that is has occurred, then you don't have much choice. You can:
Interrupt (i.e. send a signal/exception to) all the threads holding the lock. They will have to be able to handle the resulting interrupt, though.
Kill all the threads/processes involved. This is a drastic action, and it saves the rest of the system at the expense of the risk that some data will probably be lost by the program.
You are asking how to handle deadlocks. This is not the right question: You should avoid them. Make sure they don't happen because, realistically, your program cannot recover from them.
You can kill some of the deadlocked tasks, and hope the others can then proceed, and will not remain in, or immediately fall back into, deadlock. This is not particularly reliable.
You can kill all the deadlocked tasks. That will free up resources that would otherwise never be used without outside intervention. However, your tasks are now dead -- and if you start them all up again, there's no reason why they can't deadlock again.
As #usr says, the right thing to do is to avoid deadlocks in the first place. Any potential deadlock indicates a serious flaw in your system, and should probably cause you to rethink your design.
Temporarily prevent resources from deadlocked processes.
Back off a process to some check point allowing preemption of a needed resource and restarting the process at the checkpoint later.
Successively kill processes until the system is deadlock free.

Which is the better method? Allowing the thread to sleep for a while or deleting it and recreating it later?

We have a process that needs to run every two hours. It's a process that needs to run on it's own thread so as to not interrupt normal processing.
When it runs, it will download 100k records and verify them against a database. The framework to run this has a lot of objects managing this process. These objects only need to be around when the process is running.
What's a better standard?
Keep the thread in wait mode by letting it sleep until I need it again. Or,
Delete it when it is done and create it the next time I need it? (System Timer Events.)
There is not that much difference between the two solutions. I tend to prefer the one where the thread is created each time.
Having a thread lying around consumes resources (memory at least). In a garbage collected language, it may be easy to have some object retained in this thread, thus using even more memory. If you have not the thread laying around, all resources are freed and made available for two hours to the main process.
When you want to stop your whole process, where your thread may be executing or not, you need to interrupt the thread cleanly. It is always difficult to interrupt a thread or knowing if it is sleeping or working. You may have some race conditions there. Having the thread started on demand relieves you from those potential problems: you know if you started the thread and in that case calling thread_join makes you wait until the thread is done.
For those reasons, I would go for the thread on demand solution, even though the other one has no insurmontable problems.
Starting one thread every two hours is very cheap, so I would go with that.
However, if there is a chance that at some time in the future the processing could take more than the run interval, you probably want to keep the thread alive. That way, you won't be creating a second thread that will start processing the records while the first is still running, possibly corrupting data or processing records twice.
Either should be fine but I would lean towards keeping the thread around for cases where the verification takes longer than expected (ex: slow network links or slow database response).
How would you remember to start a new thread when the two hours are up ? With a timer? (That's on another thread!) with another thread that sleeps until the specified time? Shutting down the thread and restarting it based on something running somewhere else does you no good if the something else is either on it's own separate thread, or blocks the main app while it's waiting to "Create" the worker thread when the two hours are up, no?
Just let the Thread sleep...
I agree with Vilx that it's mostly a matter of taste. There is processing and memory overhead of both methods, but probably not enough for either to matter.
If you are using Java you could check Timer class. It allows you to schedule tasks on given time.
Also, if you need more control you can use quartz library.
I guess actually putting the thread to sleep is most effective, ending it and recreating it would "cost" some resources, while putting it to sleep would just fill a little space in the sceduler while it's data could be paged by the operationg system if needed.
But anyway it's probably not a very big difference, and the difference would probably depend on how good the OS' sceduler is, etc...
It really depends on one thing as I can tell... state.
If the thread creates a lot of state (allocates memory) that is useful to have during the next iteration of the thread run, then I would keep it around. That way, your process can potentially optimize its run by only performing certain operations if certain things changed since the last running.
However, if the state that the process creates is significant compared with the amount of work to be done, and you are short on resources on the machine, then it may not be worth the cost of keeping the state around in between exectutions. If thats the case, then you should recreate the thread from scratch each time.
I think it's just a matter of taste. Both are good. Use the one which you find easier to implement. :)
I would create the thread a single time, and use events/condition variables to let it sleep until signaled to wake up again. That way if the amount of time needed ever has to change, you only need change the timing in firing the event and your code will still be pretty clean.
I wouldn't think it's very important, but the best approach is very platform dependent.
A .NET System.Threading.Timer costs nothing while it's waiting, and will invoke your code on a pool thread. In theory, that would be the best of both your suggestions.
Another important thing to consider if you are on a garbage collected system like Java is that anything strongly referenced by a sleeping thread is not garbage. In that respect, it's better to kill idle threads, and let them, and any objects they reference, get cleaned up.
It all depends, of course. But by default I would go with a separate process (not thread) started on demand.

How independent are threads inside the same process?

Now, this might be a very newbie question, but I don't really have experience with multithreaded programming and I haven't fully understood how threads work compared to processes.
When a process on my machine hangs, say it's waiting for some IO that never comes or something similar, I can kill and restart it because other processes aren't affected and can, for example, still operate my terminal. This is very obvious, of course.
I'm not sure whether it is the same with threads inside a process: If one hangs, are the others unaffected? In other words, can I run a "watchdog" thread which supervises the other threads and, for example kill and recreate hanging threads? For example, if I have a threadpool that I don't want to be drained by occasional hangups.
Threads are independent, but there's a difference between a process and a thread, and that is that in the case of processes, the operating system does more than just "kill" it. It also cleans up after it.
If you start killing threads that seems to be hung, most likely you'll leave resources locked and similar, something that the operating system would close for you if you did the same to a process.
So for instance, if you open a file for writing, and start producing data and write it to the file, and this thread now hangs, for whatever reason, killing the thread will leave the file still open, and most likely locked, up until you close the entire program.
So the real answer to your question is: No, you can not kill threads the hard way.
If you simply ask a thread to close, that's different because then the thread is still in control and can clean up and close resources before terminating, but calling an API function like "KillThread" or similar is bad.
If a thread hangs, the others will continue executing. However, if the hung thread has locked a semaphore, critical section or other kind of synchronization object, and another thread attempts to lock the same synchronization object, you now have a deadlock with two dead threads.
It is possible to monitor other threads from a thread. Depending on your platform, there are appliable API's: I refer you to those as you haven't stated what OS you are writing for.
You didn't mention about the platform, but as far as I'm concerned, NT kernel schedules threads, not processes and threats them independently in that manner. This might not be and is not true on other platforms (some platforms, like Windows 3.1, do not use preemptive multithreading and if one thread goes in infinite loop, everything is affected).
The simple answer is yes.
Typically though code in a thread will handle this likely hood itself. Most commonly many APIs that perform operations that may hang will have timeout features of their own.
Alternatively a thread will wait on not just an the operation that might hang but also a timer. If the timer signals first its assummed the operation has hung.
Since for a watch dog thread to be useful in this scenario would need some co-operation from code in the other threads having the threads themselves set timeouts makes more sense than a watchdog.
Threads get scheduled independent of each other. So you could indeed stop and restart hanging threads. Threads do not run in a separate address-space so a misbehaving thread can still overwrite memory or take locks needed by other threads in the same process.
There's a pretty good overview of some of the pitfalls of killing and suspending threads in the Java documentation explaining why the methods that do it are deprecated. Basically, if you expect to be able to kill a thread, you have to be very, very careful to make it work without some sort of corruption. If a thread is hung it's probably because of a bug...in which case killing it will probably result in corruption.
http://java.sun.com/j2se/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html
If you need to be able to kill things, use processes.

Resources