Clean up child threads when main thread terminates

Clean up child threads when main thread terminates - multithreading

I'm finding myself trying to write something like this:
main = do t1 <- forkIO (forever io)
t2 <- forkIO (forever io)
forever io
`finally` traverse_ killThread [t1,t2]
But t1 and t2 can't be accessed in finally because it's outside the monad.
Since here the IO actions are run forever, my main concern is giving the threads a chance to exit cleanly in the event of a user interruption or an IOException in the last IO action.
I'm aware that packages like async and threads are great for this, but is it possible to do this easily with the basic concurrency primitives?
BTW, it'd be nice to have the runtime automatically send killThread to all child threads. When wouldn't you want that?

Just realized that there is no problem including the finally in the monadic code block.
main = do t1 <- forkIO (forever io)
t2 <- forkIO (forever io)
forever io `finally` traverse_ killThread [t1,t2]
I'm not marking this question as answered in case someone spots something wrong with this.

You don't need to kill the child threads because the Haskell runtime kills them when the main thread terminates anyway. Normally, you write code in main to wait for the child threads to complete before completing itself.

Related

How can I manually kill a thread?

I'm in ghci testing some functions that fork threads, kinda like this:
myForkingFunction = do
tid <- forkIO (workerFunction)
putStrLn ("Worker thread: " ++ show tid)
putStrLn ("...lots of other actions...")
where workerFunction = do
putStrLn "In real life I'm listening for jobs...."
workerFunction
This results in:
🐢 > myForkingFunction
Worker thread: ThreadId 216
...lots of other actions...
🐢 > In real life I'm listening for jobs....
In real life I'm listening for jobs....
In real life I'm listening for jobs....
In real life I'm listening for jobs....
(and so on)
Then as I'm :r-ing and iterating on the code, I notice that even when I reload, my workerFunction is still going.
So, I thought could just look at the printed output and do killThread (ThreadId 234) or whatever.
But first, I need to import ThreadId(..) from GHC.Conc.Sync?
Then I get this error:
<interactive>:110:22: error:
• Couldn't match a lifted type with an unlifted type
When matching types
Integer :: *
GHC.Prim.ThreadId# :: TYPE 'GHC.Types.UnliftedRep
• In the first argument of ‘ThreadId’, namely ‘805’
In the first argument of ‘killThread’, namely ‘(ThreadId 805)’
In the expression: killThread (ThreadId 805)
So I think I must be approaching this wrong?

I don't think there's any way to construct a ThreadId (e.g. from the result of show). See this question so you'll need to hold on to the ThreadId returned from forkIO, e.g. by changing your function to:
myForkingFunction = do
tid <- forkIO (workerFunction)
putStrLn ("Worker thread: " ++ show tid)
putStrLn ("...lots of other actions...")
return tid
and doing in ghci:
> tid <- myForkingFunction
> killThread tid
The error you get is a little confusing, and I believe has to do with the fact that numeric literals (like 216) are polymorphic; the argument to ThreadId is ThreadId# which is an unlifted type so you get the error you see before the compiler can even complain of "no instance Num ThreadId#"

ThreadId values are treated specially by the runtime. You can't just create one. You need access to a ThreadId either as produced by forkIO when creating the thread or by myThreadId inside the thread.
Just for extra fun, threads don't get garbage collected while any ThreadId from one is still reachable. So you can't just hold on to them in case you need them. You need to actually have a plan to manage threads.
I think you're probably best off using the async library and it's various linking operations to ensure that when the parent thread is killed, it also kills all its child threads. It's a little (or a lot) of an awkward code style, but it's the best way to get correct behavior.

A ThreadId is not really a number, although it has a number associated with it for identification. ThreadId is a wrapper around ThreadId#, which is a pointer to a thread state object (a TSO, or in the RTS code, StgTSO). That object contains all the actual internal information about a thread. There is no way to turn an Int into a ThreadId (I'm pretty sure the runtime system doesn't even maintain a mapping from thread ID numbers to ThreadId#s, and there's definitely no primop supporting such a conversion), so you can give up on that now. The only way you can killThread or throwTo a thread is if you hold on to the ThreadId that forkIO gave you. Note that killing a thread isn't usually the way you want to stop it. Normally you want to do something more controlled by communicating with it through an MVar. Most people use the tools in the async library to manage concurrency.

The problem is in the arguments of the killThread is wrong. You need to write killThread 234 instead of killThread (ThreadId 234). Have you tried that?

Behavior of async package

What would the following code using the async package do:
action <- async $ mapM_ someFunc someList
wait action
Will this merely spawn a single thread in which mapM_ occurs? (Implying that this has no benefit over just mapM_ someFunc someList)
Or will it perform the mapM_ action asynchronously (or is mapConcurrently the only way to get such behavior)?

Will this merely spawn a single thread in which mapM_ occurs?
Yes, it will fork a thread and immediately block waiting for the mapM_ to finish and return a () (or to throw an exception).
The async package is very simple; you might like to look at the source to see how it all works together and learn more about the underlying haskell concurrency primitives.

Why is chat server example on haskell.org thread safe?

I'm new to Haskell and I can't figure out what I'm not understanding about this example on the Haskell wiki: http://www.haskell.org/haskellwiki/Implement_a_chat_server
The specific code in question is this:
runConn :: (Socket, SockAddr) -> Chan Msg -> -> IO ()
runConn (sock, _) chan = do
let broadcast msg = writeChan chan msg
hdl <- socketToHandle sock ReadWriteMode
hSetBuffering hdl NoBuffering
chan' <- dupChan chan
-- fork off thread for reading from the duplicated channel
forkIO $ fix $ \loop -> do
line <- readChan chan'
hPutStrLn hdl line
loop
-- read lines from socket and echo them back to the user
fix $ \loop -> do
line <- liftM init (hGetLine hdl)
broadcast line
loop
The code above has one thread writing to the handle hdl at the same time (potentially) as another thread is reading from it. Is this safe?
I suspect the nature of forkIO (being internal to Haskell and not a system thread library or process) is what makes this work, but I'm not sure.
I checked the documentation of forkIO for any mention of IO handles
but found nothing. I also checked the documentation of System.IO but couldn't find any mention of using handles between threads without using locking.
So can someone tell me how I should know when something like this is safe when the docs don't mention anything about thread safety?

It's not the nature of forkIO that makes this works but the nature of MVar that is used to implement both Chan and Handle.
If you want to understand how Chan works take a look at this section "MVar as building blocks: Unbounded Channels" in chapter 7 of the excellent book "Parallel and Concurrent Programming in Haskell" by Simon Marlow. In the same chapter there is a section about forkIO and MVar that will help you understand how Handle can be implemented in a thread safe way.
Chapter 12 talks specifically about various ways to implement network servers, including a chat server that is implemented using STM instead of Chans.

If it wasn't safe, blocking sockets would be almost impossible to use. If your protocol is asynchronous and you're using blocking sockets, you need a thread blocking on read pretty much all the time. If you then needed to send a message to the other side, how could you do it? Wait for the other side to send you a message?

What's the best way to exit a Haskell program?

I've got a program which uses several threads. As I understand it, when thread 0 exits, the entire program exits, regardless of any other threads which might still be running.
The thing is, these other threads may have files open. Naturally, this is wrapped in exception-handling code which cleanly closes the files in case of a problem. That also means that if I use killThread (which is implemented via throwTo), the file should also be closed before the thread exits.
My question is, if I just let thread 0 exit, without attempting to stop the other threads, will all the various file handles be closed nicely? Does any buffered output get flushed?
In short, can I just exit, or do I need to manually kill threads first?

You can use Control.Concurrent.MVar to achieve this. An MVar is essentially a flag which is either ''empty'' or "full". A thread can try to read an MVar and if it is empty it blocks the thread. Wherever you have a thread which performs file IO, create an MVar for it, and pass it that MVar as an argument. Put all the MVars you create into a list:
main = do
let mvars = sequence (replicate num_of_child_threads newEmptyMVar)
returnVals <- sequence (zipWith (\m f -> f m)
mvars
(list_of_child_threads :: [MVar -> IO a]))
Once a child thread has finished all file operations that you are worried about, write to the MVar. Instead of writing killThread you can do
mapM_ takeMVar mvars >> killThread
and where-ever your thread would exit otherwise, just take all the MVars.
See the documentation on GHC concurrency for more details.

From my testing, I have discovered a few things:
exitFailure and friends only work in thread 0. (The documentation actually says so, if you go to the trouble of reading it. These functions just throw exceptions, which are silently ignored in other threads.)
If an exception kills your thread, or your whole program, any open handles are not flushed. This is excruciatingly annoying when you're desperately trying to figure out exactly where your program crashed!
So it appears it if you want your stuff flushed before the program exits, then you have to implement this. Just letting thread 0 die doesn't flush stuff, doesn't throw any exception, just silently terminates all threads without running exception handlers.

Multithreading in a monad

I need to run multiple concurrent processes in a context of the same monad, say, Connection. I expected something like the following would work:
main = runConnection connectionSettings $ do
forkIO action1
forkIO action2
action3
but forkIO required to be run in an IO context and actions to be in IO too.
Assuming those actions have a type :: Connection (), what needs to be done to run them concurrently (which instances to implement and etc)?
Currently I'm working around this as follows, but evidently this ain't right:
main = do
forkIO $ runConnection connectionSettings action1
forkIO $ runConnection connectionSettings action2
runConnection connectionSettings action3

I've found a beautiful "monad-parallel" library intended for very similar purposes, and an even more powerful fork of it - "classy-parallel".
For a monad to be usable in a parallelizable manner it needs to have an instance of a Parallel typeclass. And the "classy-parallel" library already provides instances for the most important types for such purposes: ResourceT IO and ReaderT.
Assuming the presence of appropriate instances the code in question can be transformed to following:
import qualified Control.Monad.Parallel as Parallel
main = runConnection connectionSettings $
Parallel.sequence [action1, action2, action3]
For simply forking in ResourceT the resourceForkIO can be useful. There's also a "monad-fork" library which provides a neat and simple generalization over forking on top of forkIO and resourceForkIO.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Clean up child threads when main thread terminates - multithreading

Just realized that there is no problem including the finally in the monadic code block. main = do t1 <- forkIO (forever io) t2 <- forkIO (forever io) forever io `finally` traverse_ killThread [t1,t2] I'm not marking this question as answered in case someone spots something wrong with this.

You don't need to kill the child threads because the Haskell runtime kills them when the main thread terminates anyway. Normally, you write code in main to wait for the child threads to complete before completing itself.

Related

How can I manually kill a thread?

Behavior of async package

Why is chat server example on haskell.org thread safe?

What's the best way to exit a Haskell program?

Multithreading in a monad

Categories

Resources