The documentation for forkIO says
GHC note: the new thread inherits the masked state of the parent (see mask).
The newly created thread has an exception handler that discards the exceptions
BlockedIndefinitelyOnMVar, BlockedIndefinitelyOnSTM, and ThreadKilled, and passes
all other exceptions to the uncaught exception handler.
Why does the child exception handler discard ThreadKilled? Is there some connection between the two threads after they're created?
What exactly happens when the parent thread dies? Does the child get any exception raised? Or is there any way at all to see from the child's perspective that the parent died? Is there anything else that happens except that the parent thread just stops running?
The reason why I'm asking this is that in many cases I'm forced to create a thread in a context where I can't access the parent's scope. Just imagine you're somewhere deep in a library and need to call forkIO, and have that thread die when the parent dies. Is it necessary to restructure the program and propagate the child's ThreadId up to the parent and explicitly kill it? Or is there any other workaround for this?
What exactly happens when the parent thread dies?
Nothing. That's actually also true for POSIX threads. Threads don't share the parent-child relationship you might know from fork in C or similar languages. There is, however, one main thread, and its termination will usually lead to the termination of the whole program:
Note that the thread in which main() was originally invoked differs from this. When it returns from main(), the effect is as if there was an implicit call to exit() using the return value of main() as the exit status.
Does the child get any exception raised? Or is there any way at all to see from the child's perspective that the parent died? Is there anything else that happens except that the parent thread just stops running?
No. No. And no. For the same reason as with usual OS threads. You can try this pretty easily:
import Control.Concurrent (forkIO, threadDelay)
delaySeconds n = threadDelay $ n * (10^6)
main = do
forkIO $ do
forkIO $ delaySeconds 1 >> print "Pseudo child 1"
forkIO $ delaySeconds 1 >> print "Pseudo child 2"
print "Pseudo parent says goodbye"
delaySeconds 10
print "Exiting main"
The "parent" will say goodbye, and the "children" will print a second later. Remember, there is no actual parent in thread programming. There are only siblings. One of them is a little bit special, yes, but that is just how it's been specified.
Is it necessary to restructure the program and propagate the child's ThreadId up to the parent and explicitly kill it?
At least a little bit, since forkIO doesn't provide this. Also, if there was a forkIOKillAutomatically, what type should it have? And why?
Or is there any other workaround for this?
Well, you could provide the rest of your parent as another action, and therefore use a helper:
forkRunDie :: IO () -> IO () -> IO ()
forkRunDie p s = forkIO p >>= \tid -> s >> killThread tid
The above example would then become
main = do
forkIO $ do
forkRunDie (delaySeconds 1 >> print "Pseudo child 1") $ do
forkRunDie (delaySeconds 1 >> print "Pseudo child 2") $ do
print "Pseudo parent says goodbye"
delaySeconds 10
print "Exiting main"
In this case the only output will be
"Pseudo parent says goodbye"
"Exiting main"
See also:
Smarter termination for thread racing by Conal Elliott (provides a very similar function to forkRunDie with finally).
In a standalone GHC program, only the main thread is required to terminate in order for the process to terminate. Thus all other forked threads will simply terminate at the same time as the main thread (the terminology for this kind of behaviour is "daemonic threads").
https://hackage.haskell.org/package/base-4.7.0.0/docs/Control-Concurrent.html#g:12
This is an answer inspired by Zeta's. It uses a free monad transformer to avoid explicit nesting of computations, and the withAsync function from the async package instead of forkRunDie.
module Main where
import Control.Monad
import Control.Monad.Trans
import Control.Monad.Trans.Free (FreeT,liftF,iterT)
import Control.Concurrent
import Control.Concurrent.Async (withAsync)
import Control.Exception
type DaemonIO = FreeT ((,) (IO ())) IO
launch :: IO () -> DaemonIO ()
launch a = liftF (a,())
runDaemonIO :: DaemonIO a -> IO a
runDaemonIO = iterT $ \(action,rest) -> withAsync action $ \_ -> rest
main :: IO ()
main = do
let delaySeconds n = threadDelay $ n * (10^6)
runDaemonIO $ do
launch $ (forever $ delaySeconds 1 >> print "Pseudo child 1")
`finally` putStrLn "killed 1!"
launch $ (forever $ delaySeconds 1 >> print "Pseudo child 2")
`finally` putStrLn "killed 2!"
liftIO $ delaySeconds 10
liftIO $ putStrLn "done!!!"
Related
I’d like to play tricks with forkProcess, where I want to clone my Haskell process, and then let both clones talk to each other (maybe using Cloud Haskell to send even closures around).
But I wonder how well that works with the GHC runtime. Does anyone have experience here?
The documenation for forkProcess says that no other threads are copied, so I assume all data used by other threads will then be garbage collected in the fork, which sounds good. But that means that finalizers will run in both clone, which may or may not be the right thing to do…
I assume I can’t just use it without worry; but are there rules I can follow that will make sure its use is safe?
But that means that finalizers will run in both clone, which may or may not be the right thing to do…
Finalizers are very rarely used in Haskell, and even where they are used, I would expect them to only have in-process effects. For example, a finalizer calls hClose on garbage-collected Handles if you forgot to do it yourself. This is easy to demonstrate: the following program fails with openFile: resource exhausted (Too many open files), but if you uncomment the pure (), the Handles get garbage-collected and the program completes successfully.
import Control.Concurrent
import Control.Monad
import System.IO
import System.Mem
main :: IO ()
main = do
rs <- replicateM 1000 $ do
threadDelay 1000 -- not sure why did is needed; maybe to give control back
-- to the OS, so it can recycle the file descriptors?
performGC
openFile "input" ReadMode
--pure ()
print rs -- force all the Handles to still be alive by this point
File descriptors are process-owned and are copied by forkProcess, so it makes sense to have each clone close their copies.
The case which would be problematic is if a finalizer was cleaning up a system-owned resource, e.g. deleting a file. But I hope no library is relying on finalizers to delete such resources, because as the documentation explains, finalizers are not guaranteed to run. So it's better to use something like bracket to cleanup resources (although the cleanup is still not guaranteed, e.g. if bracket is used from a thread).
What the documentation for forkProcess is warning about is not finalizers, but the fact that other threads will appear to end abruptly inside the forked process. This is especially problematic if those threads are holding locks. Normally, two threads can use modifyMVar_ to ensure that only one thread at a time is running a critical section, and as long as each thread is only holding the lock for a finite amount of time, the other thread can simply wait for the MVar to become available. If you call forkProcess while one thread is in the middle of a modifyMVar_, however, that thread will not continue in the cloned process, and so the cloned process cannot simply call modifyMVar_ or it could get stuck forever while waiting for a non-existing thread to release the lock. Here is a program demonstrating the problem.
import Control.Concurrent
import Control.Monad
import System.Posix.Process
-- >>> main
-- (69216,"forkIO thread",0)
-- (69216,"main thread",1)
-- (69216,"forkIO thread",2)
-- (69216,"main thread",3)
-- (69216,"forkIO thread",4)
-- (69216,"main thread",5)
-- calling forkProcess
-- forkProcess main thread waiting for MVar...
-- (69216,"forkIO thread",6)
-- (69216,"original main thread",7)
-- (69216,"forkIO thread",8)
-- (69216,"original main thread",9)
-- (69216,"forkIO thread",10)
-- (69216,"original main thread",11)
main :: IO ()
main = do
mvar <- newMVar (0 :: Int)
_ <- forkIO $ replicateM_ 6 $ do
modifyMVar_ mvar $ \i -> do
threadDelay 100000
processID <- getProcessID
print (processID, "forkIO thread", i)
pure (i+1)
threadDelay 50000
replicateM_ 3 $ do
modifyMVar_ mvar $ \i -> do
threadDelay 100000
processID <- getProcessID
print (processID, "main thread", i)
pure (i+1)
putStrLn "calling forkProcess"
_ <- forkProcess $ do
threadDelay 25000
replicateM_ 3 $ do
putStrLn "forkProcess main thread waiting for MVar..."
modifyMVar_ mvar $ \i -> do
threadDelay 100000
processID <- getProcessID
print (processID, "forkProcess main thread", i)
pure (i+1)
replicateM_ 3 $ do
modifyMVar_ mvar $ \i -> do
threadDelay 100000
processID <- getProcessID
print (processID, "original main thread", i)
pure (i+1)
threadDelay 100000
As the output shows, the forkProcess main thread gets stuck waiting forever for the MVar, and never prints the forkProcess main thread line. If you move the threadDelays outside the modifyMVar_ critical section, the forkIO thread is a lot less likely to be in the middle of that critical section when forkProcess is called, so you'll see an output which looks like this instead:
(69369,"forkIO thread",0)
(69369,"main thread",1)
(69369,"forkIO thread",2)
(69369,"main thread",3)
(69369,"forkIO thread",4)
(69369,"main thread",5)
calling forkProcess
(69369,"forkIO thread",6)
(69369,"original main thread",7)
forkProcess main thread waiting for MVar...
(69370,"forkProcess main thread",6)
(69369,"forkIO thread",8)
(69369,"original main thread",9)
forkProcess main thread waiting for MVar...
(69370,"forkProcess main thread",7)
(69369,"forkIO thread",10)
(69369,"original main thread",11)
forkProcess main thread waiting for MVar...
(69370,"forkProcess main thread",8)
After the forkProcess call, there are now two MVars which both hold the value 5, and so in the original process, original main thread and forkIO thread are both incrementing one MVar, while in the other process forkProcess main thread is incrementing the other.
I'm using the async library in conjunction with stm in my program.
The main thread forks two threads which run until one of them (it could be either one) encounters a solution. The solution is returned via a TMVar. Neither of them ever waits on any TMVar except to call putTMVar when the solution is found and one of them is guaranteed to run forever unless killed. So how could I possibly be getting "thread blocked indefinitely in an STM transaction" (which seems to happen approximately one in every twenty times) given that at least one of the child threads doesn't execute any blocking STM transactions (or die) until storing a result.
Note the two child threads communicate somewhat with each other using TVars, but not with TMVars.
Simplified code:
main :: IO ()
main = do
output <- newEmptyTMVar
result <- withAsync (child1 output) $ \_ -> withAsync (child2 output) $ \_ ->
let go = do
result <- atomically $ takeTMVar output
if someCondition result
then return result
else go
in go
print result
child1 :: TMVar Result -> IO ()
child1 output = go 0
where
go i = do
case computation1 i of
Nothing -> return ()
Just x -> atomically $ putTMVar x
go (i + 1)
child2 :: TMVar Result -> IO ()
-- Does some other stuff, but also only interacts with its argument to
-- give back a result, same as child1.
I have to write a command line tool gluing together some components for an experiment and need help for a code design that meet my requirements.
At top level I have to work with samples each generated by a – in runtime as well as in memory consumption – expensive call to another program with the function "System.Process.readProcessWithExitCode". Therefore you can imagine to have a (expensive) function "genSample :: IO a" and you need n return values of that function.
My requirements are:
1. Let p be the number of processors, then at most p samples (i.e. calls to genSample) should be computed in parallel.
2. A timeout should be possible set which aborts the generation of the samples.
3. If the computation of all the samples times out, the started processes within a genSample-call should be stopped
My current solution meets requirements 1 and 2. For the third one I currently help myself by executing a killall-command. That seems to be a dirty hack to me. Perhaps someone has a better idea?
Here the central part of my current solution:
import qualified Control.Monad.Par.Class as ParIO
import qualified Control.Monad.Par.IO as ParIO
…
-- | #parRepeatM i n a# performs action #a# #n# times in parallel with timeout #t#
parRepeatM :: ParIO.NFData a =>
Integer -- ^ timeout in seconds
-> Integer -- ^ number of duplicates (here: number of req. samples)
-> IO a -- ^ action to perform (here: genSample)
-> IO (Maybe [a])
parRepeatM t n a = timeout t $ ParIO.runParIO $ do
let tasks = genericReplicate n $ liftIO a -- :: [ParIO a]
ivars <- mapM ParIO.spawn tasks
mapM ParIO.get ivars
A central problem at the moment is that after abortion due to a timeout the command called within genSample continues it's execution – in the worst case until the whole haskell-gluing-program ends.
In Haskell, cancellation is usually handled through asynchronous exceptions. That's what timeout seems to use.
So, we can try to install an exception handler in the code that executes the external process. The handler will invoke terminateProcess whenever an exception (asynchronous or not) crops up. Because terminateProcess requires a reference to the
process handle, we'll have to use createProcess instead of the higher-level readProcessWithExitCode.
First, some imports and auxiliary functions (I'm using the async package):
{-# LANGUAGE ScopedTypeVariables #-}
import Control.Applicative
import Control.Exception
import Control.Concurrent (threadDelay, MVar, newEmptyMVar, putMVar, takeMVar)
import Control.Concurrent.Async (race_, Concurrently(..), waitEither, withAsync)
import System.Process
import System.Exit
import System.IO
import qualified Data.ByteString as B
-- Executes two actions concurrently and returns the one that finishes first.
-- If an asynchronous exception is thrown, the second action is terminated
-- first.
race' :: IO a -> IO a -> IO a
race' left right =
withAsync left $ \a ->
withAsync right $ \b ->
fmap (either id id) (waitEither a b)
-- terminate external process on exception, ignore if already dead.
terminateCarefully :: ProcessHandle -> IO ()
terminateCarefully pHandle =
catch (terminateProcess pHandle) (\(e::IOException) -> return ())
This function launches an external process and returns its stdout and exit code, terminating the process if the thread is cancelled:
safeExec :: CreateProcess -> IO (B.ByteString, ExitCode)
safeExec cp =
bracketOnError
(createProcess cp {std_out = CreatePipe})
(\(_,_ ,_,pHandle) -> terminateCarefully pHandle)
(\(_,Just hOut,_,pHandle) -> do
-- Workaround for a Windows issue.
latch <- newEmptyMVar
race'
(do -- IO actions are uninterruptible on Windows :(
takeMVar latch
contents <- B.hGetContents hOut
ec <- waitForProcess pHandle
pure (contents,ec))
-- Dummy interruptible action that
-- receives asynchronous exceptions first
-- and helps to end the other action.
(onException
(do
putMVar latch ()
-- runs forever unless interrupted
runConcurrently empty)
(terminateCarefully pHandle)))
About the implementation:
bracketOnError is used to ensure that the external process is killed if an exception happens.
In Windows, I/O operations like reading from a Handle are uninterruptible (see https://ghc.haskell.org/trac/ghc/ticket/7353). This means they are impervious to asynchronous exceptions. As a workaround to this, I create a "dummy" thread that waits forever (runConcurrently empty) and can be interrupted by exceptions. When it is interrupted, it terminates the external process, causing the reads in the companion thread to finish, making the companion thread vulnerable to asynchronous exceptions again.
The "latch" is used to prevent any uninterruptible operation on the handle until the inner exception handler is installed.
It's a bit convoluted but it seems to work, at least tested with:
main :: IO ()
main = do
race_ (safeExec $ proc "calc" [])
(threadDelay (3*10^6))
The calc app is killed after three seconds. Here's the whole gist.
Remember also that:
on Windows, if the process was a shell command created by createProcess with shell, or created by runCommand or runInteractiveCommand, then terminateProcess will only terminate the shell, not the command itself.
I'm writing something like a music player and get stuck with the playback progress bar.
In my program when the play button is clicked, I use forkIO to fork a thread which controls the progressbar. However, the forked thread now executes a loop. How can I inform that thread to terminate when I stop current song or change songs.
I've been trying to use IORef Var, for example
flag <- newIORef False
forkIO $ progressBarFunc flag
and in the function progreeBarFunc it checks whether flag is true and decides to exit loop or not.
But this does not work.
More generally, how can I tell the forked thread to stop when I use forkIO to fork threads?
In addition, if I have an IORef Var and pass it to the function in forkIO, do the main thread and the forked thread share the same IORef Var or the forked thread actually has a copy of it?
You can communicate between threads using IORefs. The IORef refers to the same thing in the forked thread as it did in the main thread.
There are a few things you should check:
Does the forked thread actually get a chance to test the IORef?
Can the UI interactions you are expecting actually happen from the forked thread? Many UI libraries, including both gtk and OpenGL, have restrictions on which threads can interact with the UI.
Is the flag set for long enough that the forked thread had a chance to see it? If the flag is set to True and then back to False before the forked thread calls readIORef, it won't detect the stop.
One way to address the final problem is to use an Integer instead of a Bool for a flag.
newFlag :: IO (IORef Integer)
newFlag = newIORef 0
An observer of the flag remembers the value of the flag when the observer was created, and stops when it becomes greater. This returns True when the thread can continue (the flag has not been raised).
testFlag :: IORef Integer -> IO (IO Bool)
testFlag flag = do
n <- readIORef flag
return (fmap (<=n) (readIORef flag))
To raise the flag, the signaler increments the value.
raiseFlag :: IORef Integer -> IO ()
raiseFlag ref = atomicModifyIORef ref (\x -> (x+1,()))
This little example program demonstrates an IORef sharing a flag with other threads. It forks new threads when given the input "f", signals the threads to stop when given the input "s", and quits when given the input "q".
main = do
flag <- newFlag
let go = do
command <- getLine
case command of
"f" -> do
continue <- testFlag flag
forkIO $ thread continue
go
"s" -> do
raiseFlag flag
go
"q" -> do
raiseFlag flag
return ()
go
The threads periodically do some "work", which takes half a second, and test for the continue condition before continuing.
thread :: IO Bool -> IO ()
thread continue = go
where
go = do
me <- myThreadId
putStrLn (show me ++ " Outputting")
threadDelay 500000
c <- continue
if c then go else putStrLn (show me ++ " Stopping") >> return ()
I have a worker thread which reads data repeatedly from an MVar and performs some useful work on that. After a while, the rest of the program forgets about that worker thread, which means that it will wait on an empty MVar and become very lonely. My question is:
Will the MVar be garbage collected if threads no longer write to it, for instance because they all wait for it?
Will garbage collection kill the waiting threads?
If neither, can I somehow indicate to the compiler that the MVar should be garbage collected and the thread be killed?
EDIT: I should probably clarify the purpose of my question. I don't desire general protection against deadlocks; instead, what I would like to do is to tie the life of the worker thread to life of a value (as in: dead values are claimed by garbage collection). In other words, the worker thread is a resource that I would like to free not by hand, but when a certain value (the MVar or a derivative) is garbage collected.
Here an example program that demonstrates what I have in mind
import Control.Concurrent
import Control.Concurrent.MVar
main = do
something
-- the thread forked in something can be killed here
-- because the MVar used for communication is no longer in scope
etc
something = do
v <- newEmptyMVar
forkIO $ forever $ work =<< takeMVar v
putMVar v "Haskell"
putMVar v "42"
In other words, I want the thread to be killed when I can no longer communicate with it, i.e. when the MVar used for communication is no longer in scope. How to do that?
It will just work: when the MVar is only reachable by the thread that is blocked on it, then the thread is sent the BlockedIndefinitelyOnMVar exception, which will normally cause it to die silently (the default exception handler for a thread ignores this exception).
BTW, for doing some cleanup when the thread dies, you'll want to use forkFinally (which I just added to Control.Concurrent).
If you're lucky, you'll get a "BlockedIndefinitelyOnMVar", indicating that you're waiting on an MVar that no thread will ever write to.
But, to quote Ed Yang,
GHC only knows that a thread can be considered garbage if there are no
references to the thread. Who is holding a reference to the thread?
The MVar, as the thread is blocking on this data structure and has
added itself to the blocking list of this. Who is keeping the MVar
alive? Why, our closure that contains a call to takeMVar. So the
thread stays.
without a bit of work (which would be, by the way, quite interesting to see), BlockedIndefinitelyOnMVar is not an obviously useful mechanism for giving your Haskell programs deadlock protection.
GHC just can't solve the problem in general of knowing whether your thread will make progress.
A better approach would be to explicitly terminate threads by sending them a Done message. E.g. just lift your message type into an optional value that also includes an end-of-message value:
import Control.Concurrent
import Control.Concurrent.MVar
import Control.Monad
import Control.Exception
import Prelude hiding (catch)
main = do
something
threadDelay (10 * 10^6)
print "Still here"
something = do
v <- newEmptyMVar
forkIO $
finally
(let go = do x <- takeMVar v
case x of
Nothing -> return ()
Just v -> print v >> go
in go)
(print "Done!")
putMVar v $ Just "Haskell"
putMVar v $ Just "42"
putMVar v Nothing
and we get the correct clean up:
$ ./A
"Haskell"
"42"
"Done!"
"Still here"
I tested the simple weak MVar and it did get finalized and killed. The code is:
import Control.Monad
import Control.Exception
import Control.Concurrent
import Control.Concurrent.MVar
import System.Mem(performGC)
import System.Mem.Weak
dologger :: MVar String -> IO ()
dologger mv = do
tid <- myThreadId
weak <- mkWeakPtr mv (Just (putStrLn "X" >> killThread tid))
logger weak
logger :: Weak (MVar String) -> IO ()
logger weak = act where
act = do
v <- deRefWeak weak
case v of
Just mv -> do
a <- try (takeMVar mv) :: IO (Either SomeException String)
print a
either (\_ -> return ()) (\_ -> act) a
Nothing -> return ()
play mv = act where
act = do
c <- getLine
if c=="quit" then return ()
else putMVar mv c >> act
doplay mv = do
forkIO (dologger mv)
play mv
main = do
putStrLn "Enter a string to escape, or quit to exit"
mv <- newEmptyMVar
doplay mv
putStrLn "*"
performGC
putStrLn "*"
yield
putStrLn "*"
threadDelay (10^6)
putStrLn "*"
The session with the program was:
(chrisk)-(/tmp)
(! 624)-> ghc -threaded -rtsopts --make weak2.hs
[1 of 1] Compiling Main ( weak2.hs, weak2.o )
Linking weak2 ...
(chrisk)-(/tmp)
(! 625)-> ./weak2 +RTS -N4 -RTS
Enter a string to escape, or quit to exit
This is a test
Right "This is a test"
Tab Tab
Right "Tab\tTab"
quit
*
*
X
*
Left thread killed
*
So blocking on takeMVar did not keep the MVar alive on ghc-7.4.1 despite expectations.
While BlockedIndefinitelyOnMVar should work, also consider using ForeignPointer finalizers. The normal role of those is to delete C structures that are no longer accessible in Haskell. However, you can attach any IO finalizer to them.