Now, this might be a very newbie question, but I don't really have experience with multithreaded programming and I haven't fully understood how threads work compared to processes.
When a process on my machine hangs, say it's waiting for some IO that never comes or something similar, I can kill and restart it because other processes aren't affected and can, for example, still operate my terminal. This is very obvious, of course.
I'm not sure whether it is the same with threads inside a process: If one hangs, are the others unaffected? In other words, can I run a "watchdog" thread which supervises the other threads and, for example kill and recreate hanging threads? For example, if I have a threadpool that I don't want to be drained by occasional hangups.
Threads are independent, but there's a difference between a process and a thread, and that is that in the case of processes, the operating system does more than just "kill" it. It also cleans up after it.
If you start killing threads that seems to be hung, most likely you'll leave resources locked and similar, something that the operating system would close for you if you did the same to a process.
So for instance, if you open a file for writing, and start producing data and write it to the file, and this thread now hangs, for whatever reason, killing the thread will leave the file still open, and most likely locked, up until you close the entire program.
So the real answer to your question is: No, you can not kill threads the hard way.
If you simply ask a thread to close, that's different because then the thread is still in control and can clean up and close resources before terminating, but calling an API function like "KillThread" or similar is bad.
If a thread hangs, the others will continue executing. However, if the hung thread has locked a semaphore, critical section or other kind of synchronization object, and another thread attempts to lock the same synchronization object, you now have a deadlock with two dead threads.
It is possible to monitor other threads from a thread. Depending on your platform, there are appliable API's: I refer you to those as you haven't stated what OS you are writing for.
You didn't mention about the platform, but as far as I'm concerned, NT kernel schedules threads, not processes and threats them independently in that manner. This might not be and is not true on other platforms (some platforms, like Windows 3.1, do not use preemptive multithreading and if one thread goes in infinite loop, everything is affected).
The simple answer is yes.
Typically though code in a thread will handle this likely hood itself. Most commonly many APIs that perform operations that may hang will have timeout features of their own.
Alternatively a thread will wait on not just an the operation that might hang but also a timer. If the timer signals first its assummed the operation has hung.
Since for a watch dog thread to be useful in this scenario would need some co-operation from code in the other threads having the threads themselves set timeouts makes more sense than a watchdog.
Threads get scheduled independent of each other. So you could indeed stop and restart hanging threads. Threads do not run in a separate address-space so a misbehaving thread can still overwrite memory or take locks needed by other threads in the same process.
There's a pretty good overview of some of the pitfalls of killing and suspending threads in the Java documentation explaining why the methods that do it are deprecated. Basically, if you expect to be able to kill a thread, you have to be very, very careful to make it work without some sort of corruption. If a thread is hung it's probably because of a bug...in which case killing it will probably result in corruption.
http://java.sun.com/j2se/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html
If you need to be able to kill things, use processes.
Related
I have a question about multi sockets.
I know that I have to use select() for multi sockets. select() waits for a fd ...
But why we need to use select() when we can create a thread for each socket and perform accept() on each one seperatly ? Is it even a bad idea ? Is it just about "too many sockets, too many threads so" or what ??
It's true, you can avoid multiplexing sockets by instead spawning one thread for each socket, and then using blocking I/O on each thread.
That saves you from having to deal with select() (or poll() or etc); but now you have to deal with multiple threads instead, which is often worse.
Whether threads will be more of a hassle to manage than socket-multiplexing in your particular program depends a lot on what your program is trying to do. For example, if the threads in your program don't need to communicate/co-operate with each other or share any resources, then a multithreaded design can work well (as would a multiprocess design). On the other hand, if your threads all need to access a shared data structure or other resource, or if they need to interact with each other, then you've got a bit of a programming challenge on your hands, which you'll need to solve 100% perfectly or you'll end up with a program that "seems to work most of the time" but then occasionally deadlocks, crashes, or gives incorrect results due to incorrect/insufficient synchronization. This phenomenon of "meta-stability" is much more common/severe amongst buggy multithreaded programs than in buggy single-threaded programs, since a multithreaded program's exact flow of execution will be different every time you run it (due to the asynchronous nature of the threads with respect to each other).
Stability and code-correctness issues aside, there are a couple of other problems particular to multithreading that you avoid by using a single-threaded design:
Most OS's don't scale well above a few dozen threads. So if you're thinking one-thread-per-client, and you want to support hundreds or thousands of simultaneous clients, you're in for some performance problems.
It's hard to control a thread that is blocked in a blocking-socket call. Say the user has pressed Command-Q (or whatever the appropriate equivalent is) so it's now time for your program to quit. If you have one or more threads blocked inside a blocking-socket call, there's no straightforward way to do that:
You can't just call unilaterally call exit(), because while the main thread is tearing down process-global resources, one or more threads might still be using them, leading to an occasional crash
You can't ask the threads to exit (via atomic-boolean or whatever) and then call join() to wait for them, because they are blocking inside I/O calls and thus might take minutes/hours/days before they respond
You can't send a signal to the threads and have them react in a signal-handler, because signals are per-process, and you can't control which thread will receive the signal.
You can't just unilaterally kill the threads, because they might be holding resources (like mutexes or file handles) that would then remain unreleased forever, potentially causing deadlocks or other problems
You can't close down the threads' sockets for them, and hope that this will cause the threads to error out and terminate, as this leads to race condition if the threads also try to close down those same resources.
So even in a multithreaded design, if you want a clean shutdown (or any other sort of local control of a network-thread) you usually end up having to use non-blocking I/O and/or socket multiplexing inside each thread anyway, so now you've got the worst of both worlds, complexity-wise.
In a multi-threaded application, if a thread calls fork(), it will copy the state of only that thread. So the child process created would be a single-thread process. If some other thread were to hold a lock required by the thread which called the fork(), that lock would never be released in the child process. This is a problem.
To counter this, we can modify the fork() in two ways. Either we can copy all the threads instead of only that single one. Or we can make sure that any lock held by the (other) non-copied threads will be released. So what will be the modified fork() system call in both these cases. And which of these two would be better, or what would be the advantages and disadvantages of either option?
This is a thorny question.
POSIX has pthread_atfork() to work through the mess of mixing forks and thread creation. The NOTES section of that man page discusses mutexes etc. However, it acknowledges that getting it right is hard.
The function isn't so much an alternative to fork() as it is a way to explain to the pthread library how your program needs to be prepared for the use of fork().
In general not trying to launch a thread from the child of fork but either exiting that child or calling exec asap, will minimize problems.
This post has a good discussion of pthread_atfork().
...Or we can make sure that any lock held by the (other) non-copied threads will be released.
That's going to be harder than you realize because a program can implement "locks" entirely in user-mode code, in which case, the OS would have no knowledge of them.
Even if you were careful only to use locks that were known to the OS you still have a more general problem: Creating a new process with just the one thread would effectively be no different from creating a new process with all of the threads and then immediately killing all but one of them.
Read about why we don't kill threads. In a nutshell: Locks aren't the only state that needs to be cleaned up. Any of the threads that existed in the parent but not in the child could, at the moment of the fork call, been in the middle of making a mess that needs to be cleaned up. If that thread doesn't exist in the child, then you've lost the knowledge of what needs to be cleaned up.
we can copy all the threads instead of only that single one...
That also is a potential problem. The one thread that calls fork() would know when and why fork() was called, and it would be prepared for the fork call. None of the other threads would have any warning. And, if any of those threads is interacting with something outside of the process (e.g., talking to a remote service) then,where you previously had one client talking to the service, you suddenly have two clients, talking to the same service, and they both think that they are the only one. That's not going to end well.
Don't call fork() from multi-threaded programs.
In one project I worked on: We had a big multi-threaded program that needed to spawn other processes. How we did it is, we had it spawn a simple, single-threaded "helper" program before it created any new threads. Then, whenever it needed to spawn another process, it sent a message to the helper, and the helper did it.
If I am building a multithreaded application, all its threads would automatically get killed when I abort the application.
If I want a thread to have a lifetime equal to that of the main thread, do I really need to gracefully end the thread, or let the application abort take care of killing it?
Edit: As threading rules depend on the OS, I'd like to hear opinions for the following too:
Android
Linux
iOS
It depends on what the thread is doing.
When a thread is killed, it's execution stops at any point in the code, meaning some operations may not be finished, like
writing a file
sending network messages
But the OS will
close all handles the application owns
release any locks
free all memory
close any open file
etc...
So, as long as you can make sure that all your files etc. are in a consistent state, you don't have to worry about the system resources.
I know this is true for Windows, and I would be very surprised if it was different on other OSes. The time when a application that didn't release all resources could affect the entire system is long gone, fortunately.
No. With most non-trivial OS, you do not need to explicitly/gracefully terminate app-lifetime threads unless there is a specific and overriding need to do so.
Just one reason is that you cannot always actually do it with user code. User-level code cannot stop a thread that is running on another core than the thread requesting the stop. The OS can, and does.
Your linux/Windows OS is very good indeed at stopping threads in any state on an core and releasing resources like thread stacks, heaps, OS object handles/fd's etc. at process-termination. It's had millions of hours of testing on systems world-wide, something that your own user code is very unlikely to ever experience. If you can do so, you should let the OS do what it's good at.
In other posts, several cases have been made where user-level termination of a thread may be unavoidable. Inter-process comms is one area, as are DB connections/transactions. If you are forced into it by your requirements, then fine, go for it but, otherwise, don't try - it's a waste of time and effort writing/testing/debugging thread-stop code to do what the OS can do effectively on its own.
Beware of premature stoptimization.
I've learned that a process has running, ready, blocked, and suspended states. Threads also have these states except for suspended because it lives in the process's address space.
A process blocks most of the time when it is doing a blocking i/o or waiting for an event.
I can easily picture out a process getting blocked if its single-threaded or if it follows a one-to-many model, but how does it work if the process is multi-threaded?
For example:
I have a process with two threads in a system that follows a one-to-one model. One handles the gui and the other handles the blocking i/o. I know the process remains responsive because the other thread handles the i/o.
So is there by any chance the process gets blocked or should I just rule it out in this case?
I'm just getting into these stuff so forgive me If I haven't understand some of the important details yet.
Let's say you have a work queue where the UI thread schedules work to be done and the I\O thread looks there for work to do. The work queue itself is data that is read and modified from both threads, therefor you must synchronize access somehow or race conditions result.
The naive approach is to synchronize access to the queue using a lock (aka critical section). If the I\O thread acquires the lock and then blocks, the UI thread will only remain responsive until it decides it needs to schedule work and tries to acquire the lock. A better approach is to use a lock-free queue about which much has been written and you can easily search for more info.
But to answer your question, yes, it is still much easier than you might think to cause UI to stutter / hang even when using multiple threads. There are various libraries that make it easier or harder to solve this problem, so depending on your OS and language of choice, there may be something better than just OS primitives. Win32 (from what I remember) doesn't it make it very easy at all despite having all sorts of synchronization primitives. Pthreads and Boost never seemed very straightforward to me either. Apple's GCD makes it semantically much easier to express what you want (in my opinion), though there are still pitfalls one must be aware of (such as scheduling too many blocking operations on a single work queue to be done in parallel and causing the processor to thrash when they all wake up at the same time).
My advice is to just dive in and write lots of multithreaded code. It can be tough to debug but you will learn a lot and eventually it becomes second nature.
Is it possible to detect a hung thread? This thread is not part of any thread pool, its just a system thread. Since thread is hung, it may not process any events.
Thanks,
In theory, it is impossible. If you are on Windows and suspect that the thread might be deadlocked, I guess you could use GetThreadContext a few times and check if it is always the same, but I don't know how reliable it will be.
Not in theory, but in practice it may be possible, depending on your workload. For example if it is supposed to respond to events, you could post a thread message (in windows) and see if it responds. You could set an event or flag that would cause it to do something - you then have to wait for a "reasonable" amount of time to see if it has responded. The question then arises what you would do with the "hung" thread, even if it has really hung and isn't just taking a long time to respond. The thread cannot generally safely be killed and you cannot generally interrupt an arbitrary thread. It is safe enough to log a message to the effect, but who will care? Probably the best thing to do is to note it and figure out the bug that is causing it to hang.
Depending on the workload and the kinds of processing done and other details, it may be possible to detect a hung thread. In some cases, modern VMs can detect a lock deadlock where two threads are hung waiting for the other to release a lock. (But don't rely on this, because it isn't always possible, only sometimes.)
We need a lot more information before we can give a specific answer to your question.