How dangerous is forkProcess? How can I use it safely?

How dangerous is forkProcess? How can I use it safely? - haskell

I’d like to play tricks with forkProcess, where I want to clone my Haskell process, and then let both clones talk to each other (maybe using Cloud Haskell to send even closures around).
But I wonder how well that works with the GHC runtime. Does anyone have experience here?
The documenation for forkProcess says that no other threads are copied, so I assume all data used by other threads will then be garbage collected in the fork, which sounds good. But that means that finalizers will run in both clone, which may or may not be the right thing to do…
I assume I can’t just use it without worry; but are there rules I can follow that will make sure its use is safe?

But that means that finalizers will run in both clone, which may or may not be the right thing to do…
Finalizers are very rarely used in Haskell, and even where they are used, I would expect them to only have in-process effects. For example, a finalizer calls hClose on garbage-collected Handles if you forgot to do it yourself. This is easy to demonstrate: the following program fails with openFile: resource exhausted (Too many open files), but if you uncomment the pure (), the Handles get garbage-collected and the program completes successfully.
import Control.Concurrent
import Control.Monad
import System.IO
import System.Mem
main :: IO ()
main = do
rs <- replicateM 1000 $ do
threadDelay 1000 -- not sure why did is needed; maybe to give control back
-- to the OS, so it can recycle the file descriptors?
performGC
openFile "input" ReadMode
--pure ()
print rs -- force all the Handles to still be alive by this point
File descriptors are process-owned and are copied by forkProcess, so it makes sense to have each clone close their copies.
The case which would be problematic is if a finalizer was cleaning up a system-owned resource, e.g. deleting a file. But I hope no library is relying on finalizers to delete such resources, because as the documentation explains, finalizers are not guaranteed to run. So it's better to use something like bracket to cleanup resources (although the cleanup is still not guaranteed, e.g. if bracket is used from a thread).
What the documentation for forkProcess is warning about is not finalizers, but the fact that other threads will appear to end abruptly inside the forked process. This is especially problematic if those threads are holding locks. Normally, two threads can use modifyMVar_ to ensure that only one thread at a time is running a critical section, and as long as each thread is only holding the lock for a finite amount of time, the other thread can simply wait for the MVar to become available. If you call forkProcess while one thread is in the middle of a modifyMVar_, however, that thread will not continue in the cloned process, and so the cloned process cannot simply call modifyMVar_ or it could get stuck forever while waiting for a non-existing thread to release the lock. Here is a program demonstrating the problem.
import Control.Concurrent
import Control.Monad
import System.Posix.Process
-- >>> main
-- (69216,"forkIO thread",0)
-- (69216,"main thread",1)
-- (69216,"forkIO thread",2)
-- (69216,"main thread",3)
-- (69216,"forkIO thread",4)
-- (69216,"main thread",5)
-- calling forkProcess
-- forkProcess main thread waiting for MVar...
-- (69216,"forkIO thread",6)
-- (69216,"original main thread",7)
-- (69216,"forkIO thread",8)
-- (69216,"original main thread",9)
-- (69216,"forkIO thread",10)
-- (69216,"original main thread",11)
main :: IO ()
main = do
mvar <- newMVar (0 :: Int)
_ <- forkIO $ replicateM_ 6 $ do
modifyMVar_ mvar $ \i -> do
threadDelay 100000
processID <- getProcessID
print (processID, "forkIO thread", i)
pure (i+1)
threadDelay 50000
replicateM_ 3 $ do
modifyMVar_ mvar $ \i -> do
threadDelay 100000
processID <- getProcessID
print (processID, "main thread", i)
pure (i+1)
putStrLn "calling forkProcess"
_ <- forkProcess $ do
threadDelay 25000
replicateM_ 3 $ do
putStrLn "forkProcess main thread waiting for MVar..."
modifyMVar_ mvar $ \i -> do
threadDelay 100000
processID <- getProcessID
print (processID, "forkProcess main thread", i)
pure (i+1)
replicateM_ 3 $ do
modifyMVar_ mvar $ \i -> do
threadDelay 100000
processID <- getProcessID
print (processID, "original main thread", i)
pure (i+1)
threadDelay 100000
As the output shows, the forkProcess main thread gets stuck waiting forever for the MVar, and never prints the forkProcess main thread line. If you move the threadDelays outside the modifyMVar_ critical section, the forkIO thread is a lot less likely to be in the middle of that critical section when forkProcess is called, so you'll see an output which looks like this instead:
(69369,"forkIO thread",0)
(69369,"main thread",1)
(69369,"forkIO thread",2)
(69369,"main thread",3)
(69369,"forkIO thread",4)
(69369,"main thread",5)
calling forkProcess
(69369,"forkIO thread",6)
(69369,"original main thread",7)
forkProcess main thread waiting for MVar...
(69370,"forkProcess main thread",6)
(69369,"forkIO thread",8)
(69369,"original main thread",9)
forkProcess main thread waiting for MVar...
(69370,"forkProcess main thread",7)
(69369,"forkIO thread",10)
(69369,"original main thread",11)
forkProcess main thread waiting for MVar...
(69370,"forkProcess main thread",8)
After the forkProcess call, there are now two MVars which both hold the value 5, and so in the original process, original main thread and forkIO thread are both incrementing one MVar, while in the other process forkProcess main thread is incrementing the other.

Related

Haskell. How to make my program to terminate?

I want to make an "asynchronous" reading of stdin for a few seconds. The way I am doing if is by forking getContents and writing to a Chan. After 5 seconds, I kill the thread and read the channel.
From my understading, the code below should just print whatever is in chan and terminate, but it keeps waiting for input and ^C must be pressed to finish. This is the behaviour you have when doing getContents >>= print on its own, so I have two guesses and no clue about each:
The thread is not killed and getContent keep running asking for more input
Something weird is happening with multithreading (see considerations below)
-- OS: Ubuntu 22.04.1 LTS
-- Gnome Terminal
-- ghc 9.2.5
import Control.Concurrent.Chan ( newChan, readChan, writeChan, Chan )
import Control.Concurrent (threadDelay, forkIO, killThread)
import System.IO (getContents, BufferMode (..), stdin, hSetBuffering)
main :: IO ()
main = do
hSetBuffering stdin NoBuffering
chan <- newChan
putStrLn "start"
threadId <- forkIO $ getContents >>= writeChan chan
threadDelay 5000000
putStrLn "\nend"
killThread threadId
a <- readChan chan
print a
Some considerations:
Using getLine make it work only if Enter is pressed. But I'd like to just "stream" stdin into the channel
hSetBuffering stdin NoBuffering is neccessary, otherwise the programm hangs (I guess waiting for end of input?)
Using getContents' cause a thread blocked indefinitely in an MVar operation, which up to the documentation is due to the channel being empty. I guess getContents' actually never terminates.
Last but most importantly, the behaviour is different depending on the compilation parameters:
ghc -threaded main.hs && ./main +RTS -N2 wont print anything and will hang until ^C is pressed (same thing with -N1, -N3, etc...)
runghc main.hs will actually print whatever has being the stdin during the 5 seconds (i.e. whatever is in chan) and then hang.
Just to clarify. Here are the ouputs:
> runghc main.hs
start
abc # user input
end
"abc^C" # ^C is done manually, to termiante the program and the last \" is for formatting purpose
> ghc -threaded main.hs && ./main +RTS -N2
start
abc # user input
end
^C # ^C is done manually, to termiante the program
So the question is simple. How do I make my program to end?

The thread getContents >>= writeChan chan is not an infinite loop that constantly adds content to chan. getContents creates a thunk, which is put in chan, and the thread terminates near instantaneously. Then in the main thread readChan gets that thunk, and print a forces it. It's the forcing of the thunk which prompts reading stdin, hence your program just blocks for more input until EOF or it gets killed.
What you want to do is to explicitly take small bits of input and write them into the channel. However, in the main thread, the channel does not give you a way to tell when it's ended. A workaround is to use an IORef String instead as a channel. Write to it by explicitly appending to the stored string, and readIORef will give you whatever content was written so far.
import Control.Concurrent.Chan ( newChan, readChan, writeChan, Chan )
import Control.Concurrent (threadDelay, forkIO, killThread)
import Control.Monad (forever)
import Data.IORef
import System.IO (getContents, BufferMode (..), stdin, hSetBuffering)
main :: IO ()
main = do
hSetBuffering stdin NoBuffering
buf <- newIORef []
putStrLn "start"
threadId <- forkIO $ forever $ do
c <- getChar
atomicModifyIORef' buf (\cs -> (c : cs, ()))
threadDelay 5000000
putStrLn "\nend"
killThread threadId
a <- reverse <$> readIORef buf
print a

What happens to child threads when their parent dies in GHC Haskell?

The documentation for forkIO says
GHC note: the new thread inherits the masked state of the parent (see mask).
The newly created thread has an exception handler that discards the exceptions
BlockedIndefinitelyOnMVar, BlockedIndefinitelyOnSTM, and ThreadKilled, and passes
all other exceptions to the uncaught exception handler.
Why does the child exception handler discard ThreadKilled? Is there some connection between the two threads after they're created?
What exactly happens when the parent thread dies? Does the child get any exception raised? Or is there any way at all to see from the child's perspective that the parent died? Is there anything else that happens except that the parent thread just stops running?
The reason why I'm asking this is that in many cases I'm forced to create a thread in a context where I can't access the parent's scope. Just imagine you're somewhere deep in a library and need to call forkIO, and have that thread die when the parent dies. Is it necessary to restructure the program and propagate the child's ThreadId up to the parent and explicitly kill it? Or is there any other workaround for this?

What exactly happens when the parent thread dies?
Nothing. That's actually also true for POSIX threads. Threads don't share the parent-child relationship you might know from fork in C or similar languages. There is, however, one main thread, and its termination will usually lead to the termination of the whole program:
Note that the thread in which main() was originally invoked differs from this. When it returns from main(), the effect is as if there was an implicit call to exit() using the return value of main() as the exit status.
Does the child get any exception raised? Or is there any way at all to see from the child's perspective that the parent died? Is there anything else that happens except that the parent thread just stops running?
No. No. And no. For the same reason as with usual OS threads. You can try this pretty easily:
import Control.Concurrent (forkIO, threadDelay)
delaySeconds n = threadDelay $ n * (10^6)
main = do
forkIO $ do
forkIO $ delaySeconds 1 >> print "Pseudo child 1"
forkIO $ delaySeconds 1 >> print "Pseudo child 2"
print "Pseudo parent says goodbye"
delaySeconds 10
print "Exiting main"
The "parent" will say goodbye, and the "children" will print a second later. Remember, there is no actual parent in thread programming. There are only siblings. One of them is a little bit special, yes, but that is just how it's been specified.
Is it necessary to restructure the program and propagate the child's ThreadId up to the parent and explicitly kill it?
At least a little bit, since forkIO doesn't provide this. Also, if there was a forkIOKillAutomatically, what type should it have? And why?
Or is there any other workaround for this?
Well, you could provide the rest of your parent as another action, and therefore use a helper:
forkRunDie :: IO () -> IO () -> IO ()
forkRunDie p s = forkIO p >>= \tid -> s >> killThread tid
The above example would then become
main = do
forkIO $ do
forkRunDie (delaySeconds 1 >> print "Pseudo child 1") $ do
forkRunDie (delaySeconds 1 >> print "Pseudo child 2") $ do
print "Pseudo parent says goodbye"
delaySeconds 10
print "Exiting main"
In this case the only output will be
"Pseudo parent says goodbye"
"Exiting main"
See also:
Smarter termination for thread racing by Conal Elliott (provides a very similar function to forkRunDie with finally).

In a standalone GHC program, only the main thread is required to terminate in order for the process to terminate. Thus all other forked threads will simply terminate at the same time as the main thread (the terminology for this kind of behaviour is "daemonic threads").
https://hackage.haskell.org/package/base-4.7.0.0/docs/Control-Concurrent.html#g:12

This is an answer inspired by Zeta's. It uses a free monad transformer to avoid explicit nesting of computations, and the withAsync function from the async package instead of forkRunDie.
module Main where
import Control.Monad
import Control.Monad.Trans
import Control.Monad.Trans.Free (FreeT,liftF,iterT)
import Control.Concurrent
import Control.Concurrent.Async (withAsync)
import Control.Exception
type DaemonIO = FreeT ((,) (IO ())) IO
launch :: IO () -> DaemonIO ()
launch a = liftF (a,())
runDaemonIO :: DaemonIO a -> IO a
runDaemonIO = iterT $ \(action,rest) -> withAsync action $ \_ -> rest
main :: IO ()
main = do
let delaySeconds n = threadDelay $ n * (10^6)
runDaemonIO $ do
launch $ (forever $ delaySeconds 1 >> print "Pseudo child 1")
`finally` putStrLn "killed 1!"
launch $ (forever $ delaySeconds 1 >> print "Pseudo child 2")
`finally` putStrLn "killed 2!"
liftIO $ delaySeconds 10
liftIO $ putStrLn "done!!!"

Selective send on TChan?

AFAIK TChan acts as a hub, every message sent is seen by others right ?!
i want a TChan that acts as a switch to send a message to specific thread, and also support broadcasting.
is there such thing ?

Edit: I re-read your question. This answer doesn't quite address "selective send", though it clarifies what a TChan can do.
The "broadcast" approach described below will wake up all listeners (though on the bright side, it won't make 1000 copies of each item). To avoid this, use the Map approach as #Mikhail suggested. I did this in my chat server example.
A TChan is a FIFO queue:
writeTChan adds an item to the end.
readTChan reads an item from the beginning.
For example, the following example forks 10 threads which fight over a single channel:
import Control.Concurrent
import Control.Concurrent.STM
import Control.Monad
main = do
chan <- newTChanIO
forM_ [1..10] $ \i ->
forkIO $
forever $ do
x <- atomically $ readTChan chan
putStrLn $ "Thread " ++ show i ++ ": " ++ show x
mapM_ (atomically . writeTChan chan) [1..1000]
-- Wait for channel to empty out
atomically $ do
empty <- isEmptyTChan chan
when (not empty) retry
Here, each item is read by exactly one thread.
In contrast, the following example "broadcasts" a stream of items to 10 threads, by making ten copies of the channel using dupTChan:
import Control.Concurrent
import Control.Concurrent.STM
import Control.Monad
main = do
master <- newTChanIO
forM_ [1..10] $ \i -> do
chan <- atomically $ dupTChan master
forkIO $
forever $ do
x <- atomically $ readTChan chan
putStrLn $ "Thread " ++ show i ++ ": " ++ show x
mapM_ (atomically . writeTChan master) [1..100]
-- Give threads time to complete
threadDelay 1000000
Now each thread gets all of the items written to the channel.
A couple subtleties to note:
Items written to a channel prior to dupTChan will not appear in the new channel. If we called dupTChan from the child threads rather than the main thread, some writeTChans could happen first, meaning the children might not see all the items.
Since nobody is reading the master channel, items written to it will pile up and will likely not be garbage collected. To avoid this caveat, use newBroadcastTChan to create the master channel.

Killing a thread when MVar is garbage collected

I have a worker thread which reads data repeatedly from an MVar and performs some useful work on that. After a while, the rest of the program forgets about that worker thread, which means that it will wait on an empty MVar and become very lonely. My question is:
Will the MVar be garbage collected if threads no longer write to it, for instance because they all wait for it?
Will garbage collection kill the waiting threads?
If neither, can I somehow indicate to the compiler that the MVar should be garbage collected and the thread be killed?
EDIT: I should probably clarify the purpose of my question. I don't desire general protection against deadlocks; instead, what I would like to do is to tie the life of the worker thread to life of a value (as in: dead values are claimed by garbage collection). In other words, the worker thread is a resource that I would like to free not by hand, but when a certain value (the MVar or a derivative) is garbage collected.
Here an example program that demonstrates what I have in mind
import Control.Concurrent
import Control.Concurrent.MVar
main = do
something
-- the thread forked in something can be killed here
-- because the MVar used for communication is no longer in scope
etc
something = do
v <- newEmptyMVar
forkIO $ forever $ work =<< takeMVar v
putMVar v "Haskell"
putMVar v "42"
In other words, I want the thread to be killed when I can no longer communicate with it, i.e. when the MVar used for communication is no longer in scope. How to do that?

It will just work: when the MVar is only reachable by the thread that is blocked on it, then the thread is sent the BlockedIndefinitelyOnMVar exception, which will normally cause it to die silently (the default exception handler for a thread ignores this exception).
BTW, for doing some cleanup when the thread dies, you'll want to use forkFinally (which I just added to Control.Concurrent).

If you're lucky, you'll get a "BlockedIndefinitelyOnMVar", indicating that you're waiting on an MVar that no thread will ever write to.
But, to quote Ed Yang,
GHC only knows that a thread can be considered garbage if there are no
references to the thread. Who is holding a reference to the thread?
The MVar, as the thread is blocking on this data structure and has
added itself to the blocking list of this. Who is keeping the MVar
alive? Why, our closure that contains a call to takeMVar. So the
thread stays.
without a bit of work (which would be, by the way, quite interesting to see), BlockedIndefinitelyOnMVar is not an obviously useful mechanism for giving your Haskell programs deadlock protection.
GHC just can't solve the problem in general of knowing whether your thread will make progress.
A better approach would be to explicitly terminate threads by sending them a Done message. E.g. just lift your message type into an optional value that also includes an end-of-message value:
import Control.Concurrent
import Control.Concurrent.MVar
import Control.Monad
import Control.Exception
import Prelude hiding (catch)
main = do
something
threadDelay (10 * 10^6)
print "Still here"
something = do
v <- newEmptyMVar
forkIO $
finally
(let go = do x <- takeMVar v
case x of
Nothing -> return ()
Just v -> print v >> go
in go)
(print "Done!")
putMVar v $ Just "Haskell"
putMVar v $ Just "42"
putMVar v Nothing
and we get the correct clean up:
$ ./A
"Haskell"
"42"
"Done!"
"Still here"

I tested the simple weak MVar and it did get finalized and killed. The code is:
import Control.Monad
import Control.Exception
import Control.Concurrent
import Control.Concurrent.MVar
import System.Mem(performGC)
import System.Mem.Weak
dologger :: MVar String -> IO ()
dologger mv = do
tid <- myThreadId
weak <- mkWeakPtr mv (Just (putStrLn "X" >> killThread tid))
logger weak
logger :: Weak (MVar String) -> IO ()
logger weak = act where
act = do
v <- deRefWeak weak
case v of
Just mv -> do
a <- try (takeMVar mv) :: IO (Either SomeException String)
print a
either (\_ -> return ()) (\_ -> act) a
Nothing -> return ()
play mv = act where
act = do
c <- getLine
if c=="quit" then return ()
else putMVar mv c >> act
doplay mv = do
forkIO (dologger mv)
play mv
main = do
putStrLn "Enter a string to escape, or quit to exit"
mv <- newEmptyMVar
doplay mv
putStrLn "*"
performGC
putStrLn "*"
yield
putStrLn "*"
threadDelay (10^6)
putStrLn "*"
The session with the program was:
(chrisk)-(/tmp)
(! 624)-> ghc -threaded -rtsopts --make weak2.hs
[1 of 1] Compiling Main ( weak2.hs, weak2.o )
Linking weak2 ...
(chrisk)-(/tmp)
(! 625)-> ./weak2 +RTS -N4 -RTS
Enter a string to escape, or quit to exit
This is a test
Right "This is a test"
Tab Tab
Right "Tab\tTab"
quit
*
*
X
*
Left thread killed
*
So blocking on takeMVar did not keep the MVar alive on ghc-7.4.1 despite expectations.

While BlockedIndefinitelyOnMVar should work, also consider using ForeignPointer finalizers. The normal role of those is to delete C structures that are no longer accessible in Haskell. However, you can attach any IO finalizer to them.

Block until keypress or given time of day

How does one block until the earlier of (1) a keypress or (2) a previously input time of day in hh:mm format is reached. I am using Windows in case that matters. This DOS assembler program (which does run on Windows too) does what I want via something like batchman waittil 16:30 from the Windows console but I want to do it entirely in Haskell, (i.e. without making use of that program).

You can start two threads: one reads a character, the other waits until the specified time is reached; they both write to a single MVar to signal completion.
This is a little tricky, but mostly due to the details: we want to have stdin in unbuffered and non-echoing mode so that a single keypress stops the waiting without printing anything, and then restore the original state afterwards; and we also need to kill both threads after either finishes, so that we, for example, stop reading from stdin once the timeout expires. Additionally, we need to ensure things are cleaned up properly if an exception occurs. bracket simplifies the clean-up logic here, but it's still pretty ugly:
import Prelude hiding (catch)
import Control.Exception
import Control.Concurrent
import System.IO
withRawStdin :: IO a -> IO a
withRawStdin = bracket uncook restore . const
where
uncook = do
oldBuffering <- hGetBuffering stdin
oldEcho <- hGetEcho stdin
hSetBuffering stdin NoBuffering
hSetEcho stdin False
return (oldBuffering, oldEcho)
restore (oldBuffering, oldEcho) = do
hSetBuffering stdin oldBuffering
hSetEcho stdin oldEcho
waitFor :: Int -> IO ()
waitFor delay = do
done <- newEmptyMVar
withRawStdin . bracket (start done) cleanUp $ \_ -> takeMVar done
where
start done = do
t1 <- forkIO $ getChar >> putMVar done ()
t2 <- forkIO $ threadDelay delay >> putMVar done ()
return (t1, t2)
cleanUp (t1, t2) = do
killThread t1
killThread t2
Even after all that, this solution still doesn't handle waiting until a specific time — just waiting a certain number of microseconds. For turning a time of day into a number of microseconds to sleep, this previous SO question may help. If the sleeps are sufficiently long, then they might not fit into an Int of microseconds, so you might have to use threadDelay in a loop, or delay from the unbounded-delays package.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How dangerous is forkProcess? How can I use it safely? - haskell

Related

Haskell. How to make my program to terminate?

What happens to child threads when their parent dies in GHC Haskell?

Selective send on TChan?

Killing a thread when MVar is garbage collected

Block until keypress or given time of day

Categories

Resources