Intervening threads that waited for too long - multithreading

Is there anyway in F# that I can detect if a current waiting thread is waiting for too long without being contacted?
I have a case where threads must be actively contacting other waiting threads to pass their work to once they're finished. My solution is having a bug somewhere that sometimes one or more threads just wait for too long and eventually the program got deadlocked because other threads don't contact them.
I think by detecting if a waiting thread is simply waiting for too long, it will just actively go looking for available work, rather than keeping waiting for other threads to pass to it.

It's probably better to try and understand why your threads are getting stuck than just terminating them. If you can reproduce this with the Visual Studio debugger attached, you can click the Pause button and use the Threads window to see what code all threads are in.
That said; if you still have the need to do this, the solution will depend on how you're managing your threads. To monitor them from the outside, you'll need some process that has a list of threads and the ability to tell whether they're dead.
The Thread class doesn't appear have any built-in mechanism for sharing state between the thread and its control except for Name. You could possibly abuse name, but I would probably have a thread-safe collection (eg. a ConcurrentDictionary<Thread, DateTime>) to store all of the threads and the timestamp of their last communication, and pass an Action into each thread when it's started that allows it to "Ping" by calling the action periodically. The action would simply update the DateTime stored against that thread.
The controlling process then simply scans through the dictionary periodically for anything with a timestamp that is too old, declares that thread dead and Aborts() it.
It's hard to give a code sample without knowing exactly how you're spawning your threads and describe what a thread "being contacted" means in more detail.

Related

How to detect if a linux thread is crashed

I've this problem, I need to understand if a Linux thread is running or not due to crash and not for normal exit. The reason to do that is try to restart the thread without reset\restart all system.
The pthread_join() seems not a good option because I've several thread to monitoring and the function return on specific thread, It doesn't work in "parallel". At moment I've a keeep live signal from thread to main but I'm looking for some system call or thread attribute to understand the state
Any suggestion?
P
Thread "crashes"
How to detect if a linux thread is crashed
if (0) //...
That is, the only way that a pthreads thread can terminate abnormally while other threads in the process continue to run is via thread cancellation,* which is not well described as a "crash". In particular, if a signal is received whose effect is abnormal termination then the whole process terminates, not just the thread that handled the signal. Other kinds of errors do not cause threads to terminate.
On the other hand, if by "crash" you mean normal termination in response to the thread detecting an error condition, then you have no limitation on what the thread can do prior to terminating to communicate about its state. For example,
it could update a shared object that tracks information about your threads
it could write to a pipe designated for the purpose
it could raise a signal
If you like, you can use pthread_cleanup_push() to register thread cleanup handlers to help with that.
On the third hand, if you're asking about detecting live threads that are failing to make progress -- because they are deadlocked, for example -- then your best bet is probably to implement some form of heartbeat monitor. That would involve each thread you want to monitor periodically updating a shared object that tracks the time of each thread's last update. If a thread goes too long between beats then you can guess that it may be stalled. This requires you to instrument all the threads you want to monitor.
Thread cancellation
You should not use thread cancellation. But if you did, and if you include termination because of cancellation in your definition of "crash", then you still have all the options above available to you, but you must engage them by registering one or more cleanup handlers.
GNU-specific options
The main issues with using pthread_join() to check thread state are
it doesn't work for daemon threads, and
pthread_join() blocks until the specified thread terminates.
For daemon threads, you need one of the approaches already discussed, but for ordinary threads on GNU/Linux, Glibc provides non-standard pthread_tryjoin_np(), which performs a non-blocking attempt to join a thread, and also pthread_timedjoin_np(), which performs a join attempt with a timeout. If you are willing to rely on Glibc-specific functions then one of these might serve your purpose.
Linux-specific options
The Linux kernel makes per-process thread status information available via the /proc filesystem. See How to check the state of Linux threads?, for example. Do be aware, however, that the details vary a bit from one kernel version to another. And if you're planning to do this a lot, then also be aware that even though /proc is a virtual filesystem (so no physical disk is involved), you still access it via slow-ish I/O interfaces.
Any of the other alternatives is probably better than reading files in /proc. I mention it only for completeness.
Overall
I'm looking for some system call or thread attribute to understand the state
The pthreads API does not provide a "have you terminated?" function or any other such state-inquiry function, unless you count pthread_join(). If you want that then you need to roll your own, which you can do by means of some of the facilities already discussed.
*Do not use thread cancellation.

Ending a thread that might be joined or dereferenced

I'm having a problem deciding on what to do in this situation, I want to have a detached thread, but still be able to join it in case I want to abort it early, presumably before starting a new instance of it, to make sure I don't have the thread still accessing things when it shouldn't.
This means I shouldn't detach the thread right after calling it, so then I have a few options:
Self-detach the thread when it's reaching the end of its execution, but then wouldn't this cause problems if I try to join it from the main thread? This would be my prefered solution if the problem of trying to join it after it's self-detached could be solved. I could dereference the thread handle that the main thread has access to from the self-detaching thread before self-detaching it, however in case the main thread tries to join right before the handle is dereferenced and the thread self-detached this could cause problems, so I'd have to protect the dereferencing in the thread and however (I don't know how, I might need to create a variable to indicate this) I would check if I should join in the main thread with a mutex, which complicates things. Somehow I have a feeling that this isn't the right way to do it.
Leave the thread hanging until eventually I join it, which could take a long time to happen, depending on how I organise things it could be not before I get rid of what it made (e.g. joining the thread right before freeing an image that was loaded/processed by the thread when I don't need it anymore)
Have the main thread poll periodically to know when the thread has done its job, then join it (or detach it actually) and indicate not to try joining it again?
Or should I just call pthread_exit() from the thread, but then what if I try to join it?
If I sound a bit confused it's because I am. I'm writing in C99 using TinyCThread, a simple wrapper to pthread and Win32 API threading. I'm not even sure how to dereference my thread handles, on Windows the thread handle is HANDLE, and setting a handle to NULL seems to do it, I'm not sure that's the right way to do it with the pthread_t type.
Epilogue: Based on John Bollinger's answer I chose to go with detaching the thread, putting most of that thread's code in a mutex, this way if any other thread wants to block until the thread is practically done it can use that mutex.
The price of using an abstraction layer such as TinyCThreads is that you can rely only on the defined characteristics of the abstraction. Both Windows and POSIX provide features and details that are not necessarily reflected by TinyCThreads. On the other hand, this may force you to rely on a firmer foundation than you might otherwise hack together with the help of implementation-specific features.
Anyway, you say,
I want to have a detached thread, but still be able to join it in case I want to abort it early,
but that's inconsistent. Once you detach a thread, you cannot join it. I suspect you meant something more like, "I want a thread that I can join as long as it is running, but that I don't have to join when it terminates." That's at least consistent, but it focuses on mechanism.
What I think you actually want would be described better as a thread that you can cancel synchronously as long as it is running, but that you otherwise don't need to join when it terminates. I note, however, that the whole idea presupposes a way to make the thread terminate early, and it does not appear that TinyCThread provides any built-in facility for that. It will also require a mechanism to determine whether a given thread is still alive, and TinyCThread does not provide that, either.
First, then, you need some additional per-thread shared state that tracks thread status (running / abort requested / terminated). Because the state is shared, you'll need a mutex to protect it, and that will probably need to be per-thread, too. Furthermore, in order to enable one thread (e.g. the main one) to wait for that state to change when it cancels a thread, it will need a per-thread condition variable.
With that in place, the new thread can self-detach, but it must periodically check whether an abort has been requested. When the thread ends its work, whether because it discovers an abort has been requested or because it reaches the normal end of its work, it performs any needed cleanup, sets the status to "terminated", broadcasts to the CV, and exits.
Any thread that wants to cancel another locks the associated mutex, and checks whether the thread is already terminated. If not, it sets the thread status to "abort requested", and waits on the condition variable until the status becomes "terminated". If desired, you could use a timed wait to allow the cancellation request to time out. After successfully canceling the thread, it may be possible to clean up the mutex, cv, and shared variable.
I note that all of that hinges on my interpretation of your request, and in particular, on the prospect that what you're after is aborting / canceling threads. None of the alternatives you floated seem to address that; for the most part they abandon the unwanted thread, which does not serve your expressed interest in preventing it from making unwanted changes to shared state.
It's not clear to me what you want, but you can use a condition variable to implement basically arbitrary joining semantics for threads. The POSIX Rationale contains an example of this, showing how to implement pthread_join with a timeout (search for timed_thread).

Reasons for not using SendMessage() to access UI controls from other threads?

I have read that SendMessage() should not be used to access UI controls from other threads, but I'm not sure I know why, the only reason that I can think of is since SendMessage() is a blocking call, then it could cause a deadlock in certain situations.
But is this the only reason not to use it?
Edit: This article talks about the reasons not to use SendMessage() but I don't find it to be very clear (it is intended for .NET).
It is best to keep in mind that the odds that you will write correct code are not very good. And the generic advice is don't do it! It is never necessary, the UI thread of a GUI program in Windows was entirely structured to make it simple to allow code that runs on another thread or inside a process affect the UI of the program. The point of the message loop, the universal solution to the producer-consumer problem. PostMessage() is your weapon to take advantage of it.
Before you forge ahead anyway, start by thinking about a simple problem that's very hard to solve when you use SendMessage. How do you close a window safely and correctly?
Given is that the exact moment in time that you need to close the window is entirely unpredictable and completely out of sync with the execution of your worker thread. It is the user that closes it, or asks the UI thread to terminate, you need to make sure that the thread has exited and stops calling SendMessage before you can actually close the window.
The intuitive way to do this is to signal an event in your WM_CLOSE message handler, asking the thread to stop. And wait for it to complete, then the window can close. Intuitive, but it does not work, it will deadlock your program. Sometimes, not always, very hard to debug. Goes wrong when the thread cannot check the event because it is stuck in the SendMessage call. Which cannot complete since the UI thread is waiting for the thread to exit. The worker thread cannot continue and the UI thread cannot continue. A "deadly embrace", your program will hang and needs to be killed forcibly. Deadlock is a standard threading bug.
You'll shout, "I'll use SendMessageTimeout!" But what do you pass for the uTimeout argument and how do you interpret an ERROR_TIMEOUT error? It is pretty common for a UI thread to go catatonic for a while, surely you've seen the "ghost window" before, the one that shows 'Not Responding` in the title bar. So an ERROR_TIMEOUT does not reliably indicate that the UI thread is trying to shut down unless you make uTimeout very large. At least 10 seconds. That kinda works, getting the occasional 10 second hang at exit is however not very pretty.
Solve this kind of problem for all the messages, not just WM_CLOSE. WM_PAINT ought to be next, another one that's very, very hard to solve cleanly. Your worker thread asks to update the display a millisecond before the UI thread calls EndPaint(). And thus never displays the update, it simply gets lost. A threading race, another standard threading bug.
The third classic threading bug is a fire-hose problem. Happens when your worker thread produces results faster than the UI thread can handle them. Very common, UI updates are pretty expensive. Easy to detect, very hard to solve and unpredictable when it occurs. Easy to detect because your UI will freeze, the UI thread burns 100% core trying to keep up with the message rate. It doesn't get around to its low-priority tasks anymore. Like painting. Goes wrong both when you use SendMessage or PostMessage. In the latter case you'll fill the message queue up to capacity. It starts failing after it contains 10000 unprocessed messages.
Long story short, yes, SendMessage() is thread-safe. But thread-safety is not a transitive property, it doesn't automatically make your own code thread-safe. You still suffer from all the things that can go wrong when you use threads. Deadlocks, races, fire-hosing. Fear the threading beast.

How to stop long executing threads gracefully?

I have a threading problem with Delphi. I guess this is common in other languages too. I have a long process which I do in a thread, that fills a list in main window. But if some parameters change in the mean time, then I should stop current executing thread and start from the beginning. Delphi suggests terminating a thread by setting Terminated:=true and checking for this variable's value in the thread. However my problem is this, the long executing part is buried in a library call and in this call I cannot check for the Terminated variable. Therefore I had to wait for this library call to finish, which affects the whole program.
What is the preferred way to do in this case? Can I kill the thread immediately?
The preferred way is to modify the code so that it doesn't block without checking for cancellation.
Since you can't modify the code, you can't do that; you either have to live with the background operation (but you can disassociate it from any UI, so that its completion will be ignored); or alternatively, you can try terminating it (TerminateThread API will rudely terminate any thread given its handle). Termination isn't clean, though, like Rob says, any locks held by the thread will be abandoned, and any cross-thread state protected by such locks may be in a corrupted state.
Can you consider calling the function in a separate executable? Perhaps using RPC (pipes, TCP, rather than shared memory owing to same lock problem), so that you can terminate a process rather than terminating a thread? Process isolation will give you a good deal more protection. So long as you aren't relying on cross-process named things like mutexes, it should be far safer than killing a thread.
The threads need to co-operate to achieve a graceful shutdown. I am not sure if Delphi offers a mechanism to abort another thread, but such mechanisms are available in .NET and Java, but should be considered an option of last resort, and the state of the application is indeterminate after they have been used.
If you can kill a thread at an arbitrary point, then you may kill it while it is holding a lock in the memory allocator (for example). This will leave your program open to hanging when your main thread next needs to access that lock.
If you can't modify the code to check for termination, then just set its priority really low, and ignore it when it returns.
I wrote this in reply to a similar question:
I use an exception-based technique
that's worked pretty well for me in a
number of Win32 applications.
To terminate a thread, I use
QueueUserAPC to queue a call to a
function which throws an exception.
However, the exception that's thrown
isn't derived from the type
"Exception", so will only be caught by
my thread's wrapper procedure.
I've used this with C++Builder apps very successfully. I'm not aware of all the subtleties of Delphi vs C++ exception handling, but I'd expect it could easily be modified to work.

How independent are threads inside the same process?

Now, this might be a very newbie question, but I don't really have experience with multithreaded programming and I haven't fully understood how threads work compared to processes.
When a process on my machine hangs, say it's waiting for some IO that never comes or something similar, I can kill and restart it because other processes aren't affected and can, for example, still operate my terminal. This is very obvious, of course.
I'm not sure whether it is the same with threads inside a process: If one hangs, are the others unaffected? In other words, can I run a "watchdog" thread which supervises the other threads and, for example kill and recreate hanging threads? For example, if I have a threadpool that I don't want to be drained by occasional hangups.
Threads are independent, but there's a difference between a process and a thread, and that is that in the case of processes, the operating system does more than just "kill" it. It also cleans up after it.
If you start killing threads that seems to be hung, most likely you'll leave resources locked and similar, something that the operating system would close for you if you did the same to a process.
So for instance, if you open a file for writing, and start producing data and write it to the file, and this thread now hangs, for whatever reason, killing the thread will leave the file still open, and most likely locked, up until you close the entire program.
So the real answer to your question is: No, you can not kill threads the hard way.
If you simply ask a thread to close, that's different because then the thread is still in control and can clean up and close resources before terminating, but calling an API function like "KillThread" or similar is bad.
If a thread hangs, the others will continue executing. However, if the hung thread has locked a semaphore, critical section or other kind of synchronization object, and another thread attempts to lock the same synchronization object, you now have a deadlock with two dead threads.
It is possible to monitor other threads from a thread. Depending on your platform, there are appliable API's: I refer you to those as you haven't stated what OS you are writing for.
You didn't mention about the platform, but as far as I'm concerned, NT kernel schedules threads, not processes and threats them independently in that manner. This might not be and is not true on other platforms (some platforms, like Windows 3.1, do not use preemptive multithreading and if one thread goes in infinite loop, everything is affected).
The simple answer is yes.
Typically though code in a thread will handle this likely hood itself. Most commonly many APIs that perform operations that may hang will have timeout features of their own.
Alternatively a thread will wait on not just an the operation that might hang but also a timer. If the timer signals first its assummed the operation has hung.
Since for a watch dog thread to be useful in this scenario would need some co-operation from code in the other threads having the threads themselves set timeouts makes more sense than a watchdog.
Threads get scheduled independent of each other. So you could indeed stop and restart hanging threads. Threads do not run in a separate address-space so a misbehaving thread can still overwrite memory or take locks needed by other threads in the same process.
There's a pretty good overview of some of the pitfalls of killing and suspending threads in the Java documentation explaining why the methods that do it are deprecated. Basically, if you expect to be able to kill a thread, you have to be very, very careful to make it work without some sort of corruption. If a thread is hung it's probably because of a bug...in which case killing it will probably result in corruption.
http://java.sun.com/j2se/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html
If you need to be able to kill things, use processes.

Resources