What's the best way to exit a Haskell program? - multithreading

I've got a program which uses several threads. As I understand it, when thread 0 exits, the entire program exits, regardless of any other threads which might still be running.
The thing is, these other threads may have files open. Naturally, this is wrapped in exception-handling code which cleanly closes the files in case of a problem. That also means that if I use killThread (which is implemented via throwTo), the file should also be closed before the thread exits.
My question is, if I just let thread 0 exit, without attempting to stop the other threads, will all the various file handles be closed nicely? Does any buffered output get flushed?
In short, can I just exit, or do I need to manually kill threads first?

You can use Control.Concurrent.MVar to achieve this. An MVar is essentially a flag which is either ''empty'' or "full". A thread can try to read an MVar and if it is empty it blocks the thread. Wherever you have a thread which performs file IO, create an MVar for it, and pass it that MVar as an argument. Put all the MVars you create into a list:
main = do
let mvars = sequence (replicate num_of_child_threads newEmptyMVar)
returnVals <- sequence (zipWith (\m f -> f m)
mvars
(list_of_child_threads :: [MVar -> IO a]))
Once a child thread has finished all file operations that you are worried about, write to the MVar. Instead of writing killThread you can do
mapM_ takeMVar mvars >> killThread
and where-ever your thread would exit otherwise, just take all the MVars.
See the documentation on GHC concurrency for more details.

From my testing, I have discovered a few things:
exitFailure and friends only work in thread 0. (The documentation actually says so, if you go to the trouble of reading it. These functions just throw exceptions, which are silently ignored in other threads.)
If an exception kills your thread, or your whole program, any open handles are not flushed. This is excruciatingly annoying when you're desperately trying to figure out exactly where your program crashed!
So it appears it if you want your stuff flushed before the program exits, then you have to implement this. Just letting thread 0 die doesn't flush stuff, doesn't throw any exception, just silently terminates all threads without running exception handlers.

Related

How can I manually kill a thread?

I'm in ghci testing some functions that fork threads, kinda like this:
myForkingFunction = do
tid <- forkIO (workerFunction)
putStrLn ("Worker thread: " ++ show tid)
putStrLn ("...lots of other actions...")
where workerFunction = do
putStrLn "In real life I'm listening for jobs...."
workerFunction
This results in:
🐢 > myForkingFunction
Worker thread: ThreadId 216
...lots of other actions...
🐢 > In real life I'm listening for jobs....
In real life I'm listening for jobs....
In real life I'm listening for jobs....
In real life I'm listening for jobs....
(and so on)
Then as I'm :r-ing and iterating on the code, I notice that even when I reload, my workerFunction is still going.
So, I thought could just look at the printed output and do killThread (ThreadId 234) or whatever.
But first, I need to import ThreadId(..) from GHC.Conc.Sync?
Then I get this error:
<interactive>:110:22: error:
• Couldn't match a lifted type with an unlifted type
When matching types
Integer :: *
GHC.Prim.ThreadId# :: TYPE 'GHC.Types.UnliftedRep
• In the first argument of ‘ThreadId’, namely ‘805’
In the first argument of ‘killThread’, namely ‘(ThreadId 805)’
In the expression: killThread (ThreadId 805)
So I think I must be approaching this wrong?
I don't think there's any way to construct a ThreadId (e.g. from the result of show). See this question so you'll need to hold on to the ThreadId returned from forkIO, e.g. by changing your function to:
myForkingFunction = do
tid <- forkIO (workerFunction)
putStrLn ("Worker thread: " ++ show tid)
putStrLn ("...lots of other actions...")
return tid
and doing in ghci:
> tid <- myForkingFunction
> killThread tid
The error you get is a little confusing, and I believe has to do with the fact that numeric literals (like 216) are polymorphic; the argument to ThreadId is ThreadId# which is an unlifted type so you get the error you see before the compiler can even complain of "no instance Num ThreadId#"
ThreadId values are treated specially by the runtime. You can't just create one. You need access to a ThreadId either as produced by forkIO when creating the thread or by myThreadId inside the thread.
Just for extra fun, threads don't get garbage collected while any ThreadId from one is still reachable. So you can't just hold on to them in case you need them. You need to actually have a plan to manage threads.
I think you're probably best off using the async library and it's various linking operations to ensure that when the parent thread is killed, it also kills all its child threads. It's a little (or a lot) of an awkward code style, but it's the best way to get correct behavior.
A ThreadId is not really a number, although it has a number associated with it for identification. ThreadId is a wrapper around ThreadId#, which is a pointer to a thread state object (a TSO, or in the RTS code, StgTSO). That object contains all the actual internal information about a thread. There is no way to turn an Int into a ThreadId (I'm pretty sure the runtime system doesn't even maintain a mapping from thread ID numbers to ThreadId#s, and there's definitely no primop supporting such a conversion), so you can give up on that now. The only way you can killThread or throwTo a thread is if you hold on to the ThreadId that forkIO gave you. Note that killing a thread isn't usually the way you want to stop it. Normally you want to do something more controlled by communicating with it through an MVar. Most people use the tools in the async library to manage concurrency.
The problem is in the arguments of the killThread is wrong. You need to write killThread 234 instead of killThread (ThreadId 234). Have you tried that?

What happens when an Async value is garbage-collected?

Well... – apparently, nothing! If I try
Prelude Control.Concurrent.Async Data.List> do {_ <- async $ return $! foldl'(+) 0 [0,0.1 .. 1e+8 :: Double]; print "Async is lost!"}
"Async is lost!"
one processor core starts going wild for a while, the interface stays as normal. Evidently the thread is started and simply runs as long as there is something to do.
But (efficiency aside), is that in principle ok, or must Asyncs always be either cancelled or waited for? Does something break because there just isn't a way to read the result anymore? And does the GC properly clean up everything? Will perhaps the thread in fact be stopped, and that just doesn't happen yet when I try it (for lack of memory pressure)? Does the thread even properly "end" at all, simply when the forkIOed action comes to an end?
I'm quite uncertain about this concurrency stuff. Perhaps I'm still thinking too much in a C++ way about this. RAII / deterministic garbage collection certainly make you feel a bit better cared for in such regards...
Internally, an Async is just a Haskell thread that writes to an STM TMVar when finished. A cancel is just sending the Haskell thread a kill signal. In Haskell, you don't need to explcititly kill threads. If the Async itself can be garbage collected, then the thread will still run to its end, and then everything will be properly cleaned up. However, if the Async ends in an exception, then wait will propagate the exception to the waiting thread. If you don't wait, you'll never know that the exception happened.

Haskell Thread Communication Pattern Scenario

You have two threads, a and b. Thread a is in a forever loop, listening on a blocking socket 1. Thread b is also in a forever loop, listening on blocking socket 2. Both socket 1 and socket 2 may return data at arbitrary times, so Thread a may be sleeping forever waiting for data whereas Thread b constantly gets data from the socket and goes on with its processing. That's the background.
Now suppose they need to share a dictionary. When Thread a gets some data (if ever) it adds a key value pair into the dictionary after some processing, and then continues to wait for more data. When Thread b receives data from its socket it first queries the dictionary to see if there is information related to the data it has received before going on with its processing. There are no deletions to the dictionary, only inserts and queries (I'd be interested if this makes a difference in the end solution).
In a standard imperative language like python or c, this is pretty easy to do by making the dictionary available in both scopes and only querying it after a thread has acquired a lock, so Thread B always sees the most (well almost) up to date dictionary.
In Haskell, I seem to be struggling to come up with a good implementation of this pattern. MVars, can only have one item at a time so it can't be that Thread a puts in the dictionary, since a new update might occur and it would not be able to push that new dictionary until Thread b fetched it from the MVar. On the other hand if thread b uses an MVar to send a ready signal "ok!" to thread a, it may be the case that Thread a is sleeping on its read socket, so it would be unable to send back data until its read socket unblocked! There are also channels, but that seems messy since I would have to keep sending new dictionaries and Thread B would discard all but the last one.
The alternative solution that would work is to simply send the updates down a channel, and have thread B construct the dictionary for itself. However I'm wondering if there are better alternative solutions.
Thanks for taking the time to read this very long question!
You can use an MVar in the following way:
When thread A gets new data, it tries to get the dictionary with takeMVar. When that succeeds, it updates the dictionary and puts it back into the MVar
When thread B gets data, it tries to get the dictionary with takeMVar - in the above scenario where A seldom gets data that should succeed rather quickly on average. Then it does the lookup and puts the dictionary back.
As hammar pointed out, it's probably better to not directly use takeMVar and putMVar but rather wrap them in modifyMVar_ resp. modifyMVar to not leave the MVar empty if one thread gets an exception while using the dictionary.
In thread A, something like
modifyMVar_ mvar (\dict -> putMVar mvar (insert newStuff dict))
in thread B all you need is a simple readMVar (thanks to #hammar again for pointing that out).

A way to form a 'select' on MVars without polling

I have two MVars (well an MVar and a Chan). I need to pull things out of the Chan and process them until the other MVar is not empty any more. My ideal solution would be something like the UNIX select function where I pass in a list of (presumably empty) MVars and the thread blocks until one of them is full, then it returns the full MVar. Try as I might I can think of no way of doing this beyond repeatedly polling each MVar with isEmptyMVar until I get false. This seems inefficient.
A different thought was to use throwTo, but it interrupts what ever is happening in the thread and I need to complete processing a job out the the Chan in an atomic fashion.
A final thought as I'm typing is to create a new forkIO for each MVar which tries to read its MVar then fill a newly created MVar with its own instance. The original thread can then block on that MVar. Are Haskell threads cheap enough to go running that many?
Haskell threads are very cheap, so you could solve it that way, but it sounds like STM would be a better fit for your problem. With STM you can do
do var <- atomically (takeTMVar a `orElse` takeTMVar b)
... do stuff with var
Because of the behavior of retry and orElse, this code tries to get a, then if that fails, get b. If both fail, it blocks until either of them is updated and tries again.
You could even use this to make your own rudimentary version of select:
select :: [TMVar a] -> STM a
select = foldr1 orElse . map takeTMVar
How about using STM versions, TChan and TVar, with the retry and orElse behavior?
Implementing select is one of STM's nice capabilities. From "Composable Memory Transactions":
Beyond this, we also provide orElse,
which allows them to be composed as alternatives, so that
the second is run if the first retries (Section 3.4). This ability allows threads to wait for many things at once, like the
Unix select system call – except that orElse composes well,
whereas select does not.
orElse in RWH.
The STM package
Papers on Haskell's STM

Removing the contents of a Chan or MVar in a single discrete step

I'm writing a discrete simulation where request values from multiple threads accumulate in a centralized queue. Every n milliseconds, a manager wakes up to process requests. When the manager wakes up, it should retrieve all of the contents of the central queue in a single discrete step. While processing these, any client threads attempting to submit to the queue should block. When processing completes, the queue reopens and the manager goes back to sleep.
What's the best way to do this? The retry behavior of STM isn't really what I want. If I use a Chan or MVar, there's no way to prevent clients from enqueuing additional requests during processing. One approach is to use an MVar as a mutex on a Chan holding the queue. Are there other ways to do this?
I'd have to benchmark under your expected contention levels to know exactly what the best solution is, but here's my guess.
Use an MVar containing [Item], whatever your item type is. Initialize the MVar with with newMVar []. To add an element to the central list, use modifyMVar_ (return . (item :)), where item is what you are adding to the list. Use takeMVar at the start of the processing pass, and putMVar [] and the end of it.
First, note that this isn't a queue, internally. If you want to handle things in the order they were added, reverse the list after extracting it.
Second, so long as these are the only operations you perform on the MVar, this is free of race conditions. That's because the MVar is initialized full, and each operation is "take the contents of the MVar, put something else in." Operations may block while waiting for the latter part of that, but this can't deadlock and there will be no lost updates.
Isn't there a problem with MVar [a] though? When there are no items to be read we want readers to be blocked, but since the MVar actually contains [] this doesn't happen.

Resources