Haskell. How to make my program to terminate? - multithreading

I want to make an "asynchronous" reading of stdin for a few seconds. The way I am doing if is by forking getContents and writing to a Chan. After 5 seconds, I kill the thread and read the channel.
From my understading, the code below should just print whatever is in chan and terminate, but it keeps waiting for input and ^C must be pressed to finish. This is the behaviour you have when doing getContents >>= print on its own, so I have two guesses and no clue about each:
The thread is not killed and getContent keep running asking for more input
Something weird is happening with multithreading (see considerations below)
-- OS: Ubuntu 22.04.1 LTS
-- Gnome Terminal
-- ghc 9.2.5
import Control.Concurrent.Chan ( newChan, readChan, writeChan, Chan )
import Control.Concurrent (threadDelay, forkIO, killThread)
import System.IO (getContents, BufferMode (..), stdin, hSetBuffering)
main :: IO ()
main = do
hSetBuffering stdin NoBuffering
chan <- newChan
putStrLn "start"
threadId <- forkIO $ getContents >>= writeChan chan
threadDelay 5000000
putStrLn "\nend"
killThread threadId
a <- readChan chan
print a
Some considerations:
Using getLine make it work only if Enter is pressed. But I'd like to just "stream" stdin into the channel
hSetBuffering stdin NoBuffering is neccessary, otherwise the programm hangs (I guess waiting for end of input?)
Using getContents' cause a thread blocked indefinitely in an MVar operation, which up to the documentation is due to the channel being empty. I guess getContents' actually never terminates.
Last but most importantly, the behaviour is different depending on the compilation parameters:
ghc -threaded main.hs && ./main +RTS -N2 wont print anything and will hang until ^C is pressed (same thing with -N1, -N3, etc...)
runghc main.hs will actually print whatever has being the stdin during the 5 seconds (i.e. whatever is in chan) and then hang.
Just to clarify. Here are the ouputs:
> runghc main.hs
start
abc # user input
end
"abc^C" # ^C is done manually, to termiante the program and the last \" is for formatting purpose
> ghc -threaded main.hs && ./main +RTS -N2
start
abc # user input
end
^C # ^C is done manually, to termiante the program
So the question is simple. How do I make my program to end?

The thread getContents >>= writeChan chan is not an infinite loop that constantly adds content to chan. getContents creates a thunk, which is put in chan, and the thread terminates near instantaneously. Then in the main thread readChan gets that thunk, and print a forces it. It's the forcing of the thunk which prompts reading stdin, hence your program just blocks for more input until EOF or it gets killed.
What you want to do is to explicitly take small bits of input and write them into the channel. However, in the main thread, the channel does not give you a way to tell when it's ended. A workaround is to use an IORef String instead as a channel. Write to it by explicitly appending to the stored string, and readIORef will give you whatever content was written so far.
import Control.Concurrent.Chan ( newChan, readChan, writeChan, Chan )
import Control.Concurrent (threadDelay, forkIO, killThread)
import Control.Monad (forever)
import Data.IORef
import System.IO (getContents, BufferMode (..), stdin, hSetBuffering)
main :: IO ()
main = do
hSetBuffering stdin NoBuffering
buf <- newIORef []
putStrLn "start"
threadId <- forkIO $ forever $ do
c <- getChar
atomicModifyIORef' buf (\cs -> (c : cs, ()))
threadDelay 5000000
putStrLn "\nend"
killThread threadId
a <- reverse <$> readIORef buf
print a

Related

How dangerous is forkProcess? How can I use it safely?

I’d like to play tricks with forkProcess, where I want to clone my Haskell process, and then let both clones talk to each other (maybe using Cloud Haskell to send even closures around).
But I wonder how well that works with the GHC runtime. Does anyone have experience here?
The documenation for forkProcess says that no other threads are copied, so I assume all data used by other threads will then be garbage collected in the fork, which sounds good. But that means that finalizers will run in both clone, which may or may not be the right thing to do…
I assume I can’t just use it without worry; but are there rules I can follow that will make sure its use is safe?
But that means that finalizers will run in both clone, which may or may not be the right thing to do…
Finalizers are very rarely used in Haskell, and even where they are used, I would expect them to only have in-process effects. For example, a finalizer calls hClose on garbage-collected Handles if you forgot to do it yourself. This is easy to demonstrate: the following program fails with openFile: resource exhausted (Too many open files), but if you uncomment the pure (), the Handles get garbage-collected and the program completes successfully.
import Control.Concurrent
import Control.Monad
import System.IO
import System.Mem
main :: IO ()
main = do
rs <- replicateM 1000 $ do
threadDelay 1000 -- not sure why did is needed; maybe to give control back
-- to the OS, so it can recycle the file descriptors?
performGC
openFile "input" ReadMode
--pure ()
print rs -- force all the Handles to still be alive by this point
File descriptors are process-owned and are copied by forkProcess, so it makes sense to have each clone close their copies.
The case which would be problematic is if a finalizer was cleaning up a system-owned resource, e.g. deleting a file. But I hope no library is relying on finalizers to delete such resources, because as the documentation explains, finalizers are not guaranteed to run. So it's better to use something like bracket to cleanup resources (although the cleanup is still not guaranteed, e.g. if bracket is used from a thread).
What the documentation for forkProcess is warning about is not finalizers, but the fact that other threads will appear to end abruptly inside the forked process. This is especially problematic if those threads are holding locks. Normally, two threads can use modifyMVar_ to ensure that only one thread at a time is running a critical section, and as long as each thread is only holding the lock for a finite amount of time, the other thread can simply wait for the MVar to become available. If you call forkProcess while one thread is in the middle of a modifyMVar_, however, that thread will not continue in the cloned process, and so the cloned process cannot simply call modifyMVar_ or it could get stuck forever while waiting for a non-existing thread to release the lock. Here is a program demonstrating the problem.
import Control.Concurrent
import Control.Monad
import System.Posix.Process
-- >>> main
-- (69216,"forkIO thread",0)
-- (69216,"main thread",1)
-- (69216,"forkIO thread",2)
-- (69216,"main thread",3)
-- (69216,"forkIO thread",4)
-- (69216,"main thread",5)
-- calling forkProcess
-- forkProcess main thread waiting for MVar...
-- (69216,"forkIO thread",6)
-- (69216,"original main thread",7)
-- (69216,"forkIO thread",8)
-- (69216,"original main thread",9)
-- (69216,"forkIO thread",10)
-- (69216,"original main thread",11)
main :: IO ()
main = do
mvar <- newMVar (0 :: Int)
_ <- forkIO $ replicateM_ 6 $ do
modifyMVar_ mvar $ \i -> do
threadDelay 100000
processID <- getProcessID
print (processID, "forkIO thread", i)
pure (i+1)
threadDelay 50000
replicateM_ 3 $ do
modifyMVar_ mvar $ \i -> do
threadDelay 100000
processID <- getProcessID
print (processID, "main thread", i)
pure (i+1)
putStrLn "calling forkProcess"
_ <- forkProcess $ do
threadDelay 25000
replicateM_ 3 $ do
putStrLn "forkProcess main thread waiting for MVar..."
modifyMVar_ mvar $ \i -> do
threadDelay 100000
processID <- getProcessID
print (processID, "forkProcess main thread", i)
pure (i+1)
replicateM_ 3 $ do
modifyMVar_ mvar $ \i -> do
threadDelay 100000
processID <- getProcessID
print (processID, "original main thread", i)
pure (i+1)
threadDelay 100000
As the output shows, the forkProcess main thread gets stuck waiting forever for the MVar, and never prints the forkProcess main thread line. If you move the threadDelays outside the modifyMVar_ critical section, the forkIO thread is a lot less likely to be in the middle of that critical section when forkProcess is called, so you'll see an output which looks like this instead:
(69369,"forkIO thread",0)
(69369,"main thread",1)
(69369,"forkIO thread",2)
(69369,"main thread",3)
(69369,"forkIO thread",4)
(69369,"main thread",5)
calling forkProcess
(69369,"forkIO thread",6)
(69369,"original main thread",7)
forkProcess main thread waiting for MVar...
(69370,"forkProcess main thread",6)
(69369,"forkIO thread",8)
(69369,"original main thread",9)
forkProcess main thread waiting for MVar...
(69370,"forkProcess main thread",7)
(69369,"forkIO thread",10)
(69369,"original main thread",11)
forkProcess main thread waiting for MVar...
(69370,"forkProcess main thread",8)
After the forkProcess call, there are now two MVars which both hold the value 5, and so in the original process, original main thread and forkIO thread are both incrementing one MVar, while in the other process forkProcess main thread is incrementing the other.

Killing a thread when MVar is garbage collected

I have a worker thread which reads data repeatedly from an MVar and performs some useful work on that. After a while, the rest of the program forgets about that worker thread, which means that it will wait on an empty MVar and become very lonely. My question is:
Will the MVar be garbage collected if threads no longer write to it, for instance because they all wait for it?
Will garbage collection kill the waiting threads?
If neither, can I somehow indicate to the compiler that the MVar should be garbage collected and the thread be killed?
EDIT: I should probably clarify the purpose of my question. I don't desire general protection against deadlocks; instead, what I would like to do is to tie the life of the worker thread to life of a value (as in: dead values are claimed by garbage collection). In other words, the worker thread is a resource that I would like to free not by hand, but when a certain value (the MVar or a derivative) is garbage collected.
Here an example program that demonstrates what I have in mind
import Control.Concurrent
import Control.Concurrent.MVar
main = do
something
-- the thread forked in something can be killed here
-- because the MVar used for communication is no longer in scope
etc
something = do
v <- newEmptyMVar
forkIO $ forever $ work =<< takeMVar v
putMVar v "Haskell"
putMVar v "42"
In other words, I want the thread to be killed when I can no longer communicate with it, i.e. when the MVar used for communication is no longer in scope. How to do that?
It will just work: when the MVar is only reachable by the thread that is blocked on it, then the thread is sent the BlockedIndefinitelyOnMVar exception, which will normally cause it to die silently (the default exception handler for a thread ignores this exception).
BTW, for doing some cleanup when the thread dies, you'll want to use forkFinally (which I just added to Control.Concurrent).
If you're lucky, you'll get a "BlockedIndefinitelyOnMVar", indicating that you're waiting on an MVar that no thread will ever write to.
But, to quote Ed Yang,
GHC only knows that a thread can be considered garbage if there are no
references to the thread. Who is holding a reference to the thread?
The MVar, as the thread is blocking on this data structure and has
added itself to the blocking list of this. Who is keeping the MVar
alive? Why, our closure that contains a call to takeMVar. So the
thread stays.
without a bit of work (which would be, by the way, quite interesting to see), BlockedIndefinitelyOnMVar is not an obviously useful mechanism for giving your Haskell programs deadlock protection.
GHC just can't solve the problem in general of knowing whether your thread will make progress.
A better approach would be to explicitly terminate threads by sending them a Done message. E.g. just lift your message type into an optional value that also includes an end-of-message value:
import Control.Concurrent
import Control.Concurrent.MVar
import Control.Monad
import Control.Exception
import Prelude hiding (catch)
main = do
something
threadDelay (10 * 10^6)
print "Still here"
something = do
v <- newEmptyMVar
forkIO $
finally
(let go = do x <- takeMVar v
case x of
Nothing -> return ()
Just v -> print v >> go
in go)
(print "Done!")
putMVar v $ Just "Haskell"
putMVar v $ Just "42"
putMVar v Nothing
and we get the correct clean up:
$ ./A
"Haskell"
"42"
"Done!"
"Still here"
I tested the simple weak MVar and it did get finalized and killed. The code is:
import Control.Monad
import Control.Exception
import Control.Concurrent
import Control.Concurrent.MVar
import System.Mem(performGC)
import System.Mem.Weak
dologger :: MVar String -> IO ()
dologger mv = do
tid <- myThreadId
weak <- mkWeakPtr mv (Just (putStrLn "X" >> killThread tid))
logger weak
logger :: Weak (MVar String) -> IO ()
logger weak = act where
act = do
v <- deRefWeak weak
case v of
Just mv -> do
a <- try (takeMVar mv) :: IO (Either SomeException String)
print a
either (\_ -> return ()) (\_ -> act) a
Nothing -> return ()
play mv = act where
act = do
c <- getLine
if c=="quit" then return ()
else putMVar mv c >> act
doplay mv = do
forkIO (dologger mv)
play mv
main = do
putStrLn "Enter a string to escape, or quit to exit"
mv <- newEmptyMVar
doplay mv
putStrLn "*"
performGC
putStrLn "*"
yield
putStrLn "*"
threadDelay (10^6)
putStrLn "*"
The session with the program was:
(chrisk)-(/tmp)
(! 624)-> ghc -threaded -rtsopts --make weak2.hs
[1 of 1] Compiling Main ( weak2.hs, weak2.o )
Linking weak2 ...
(chrisk)-(/tmp)
(! 625)-> ./weak2 +RTS -N4 -RTS
Enter a string to escape, or quit to exit
This is a test
Right "This is a test"
Tab Tab
Right "Tab\tTab"
quit
*
*
X
*
Left thread killed
*
So blocking on takeMVar did not keep the MVar alive on ghc-7.4.1 despite expectations.
While BlockedIndefinitelyOnMVar should work, also consider using ForeignPointer finalizers. The normal role of those is to delete C structures that are no longer accessible in Haskell. However, you can attach any IO finalizer to them.

QSem doesn't seem to block threads

I'm writing a simple script to run bunch of tasks in parallel using the Shelly library but I want to limit the max number of tasks running at any one time. The script takes a file with an input on each line and runs a task for that input. There are a few hundred inputs in the file and I want to limit to around 16 processes at a time.
The current script actually limits to 1 (well tries to) using a QSem with an initial count of 1. I seem to be missing something though because when I run on a test file with 4 inputs I see this:
Starting
Starting
Starting
Starting
Done
Done
Done
Done
So the threads are not blocking on the QSem as I would expect, they're all running simultaneously. I've even gone so far as to implement my own semaphores both on MVar and TVar and neither worked the way I expected. I'm obviously missing something fundamental but what? I've also tried compiling the code and running it as a binary.
#!/usr/bin/env runhaskell
{-# LANGUAGE TemplateHaskell, QuasiQuotes, DeriveDataTypeable, OverloadedStrings #-}
import Shelly
import Prelude hiding (FilePath)
import Text.Shakespeare.Text (lt)
import qualified Data.Text.Lazy as LT
import Control.Monad (forM)
import System.Environment (getArgs)
import qualified Control.Concurrent.QSem as QSem
import Control.Concurrent (forkIO, MVar, putMVar, newEmptyMVar, takeMVar)
-- Define max number of simultaneous processes
maxProcesses :: IO QSem.QSem
maxProcesses = QSem.newQSem 1
bkGrnd :: ShIO a -&gt ShIO (MVar a)
bkGrnd proc = do
mvar &lt- liftIO newEmptyMVar
_ &lt- liftIO $ forkIO $ do
-- Block until there are free processes
sem &lt- maxProcesses
QSem.waitQSem sem
putStrLn "Starting"
-- Run the shell command
result &lt- shelly $ silently proc
liftIO $ putMVar mvar result
putStrLn "Done"
-- Signal that this process is done and another can run.
QSem.signalQSem sem
return mvar
main :: IO ()
main = shelly $ silently $ do
[img, file] &lt- liftIO $ getArgs
contents &lt- readfile $ fromText $ LT.pack file
-- Run a backgrounded process for each line of input.
results &lt- forM (LT.lines contents) $ \line -> bkGrnd $ do
runStdin &ltcommand> &ltarguments>
liftIO $ mapM_ takeMVar results
As I said in my comment, each call to bkGrnd creates its own semaphonre, allowing every thread to continue without waiting. I would try something like this instead, where the semaphore is created in the main and passed each time to bkGrnd.
bkGrnd :: QSem.QSem -> ShIO a -> ShIO (MVar a)
bkGrnd sem proc = do
mvar <- liftIO newEmptyMVar
_ <- liftIO $ forkIO $ do
-- Block until there are free processes
QSem.waitQSem sem
--
-- code continues as before
--
main :: IO ()
main = shelly $ silently $ do
[img, file] <- liftIO $ getArgs
contents <- readfile $ fromText $ LT.pack file
sem <- maxProcesses
-- Run a backgrounded process for each line of input.
results <- forM (LT.lines contents) $ \line -> bkGrnd sem $ do
runStdin <command> <arguments>
liftIO $ mapM_ takeMVar results
You have an answer, but I need to add: QSem and QSemN are not thread safe if killThread or asynchronous thread death is possible.
My bug report and patch are GHC trac ticket #3160. The fixed code is available as a new library called SafeSemaphore with module Control.Concurrent.MSem, MSemN, MSampleVar, and a bonus FairRWLock.
Isn't it better
bkGrnd sem proc = do
QSem.waitQSem sem
mvar <- liftIO newEmptyMVar
_ <- liftIO $ forkIO $ do
...
so not even forkIO until you get the semaphore?

Block until keypress or given time of day

How does one block until the earlier of (1) a keypress or (2) a previously input time of day in hh:mm format is reached. I am using Windows in case that matters. This DOS assembler program (which does run on Windows too) does what I want via something like batchman waittil 16:30 from the Windows console but I want to do it entirely in Haskell, (i.e. without making use of that program).
You can start two threads: one reads a character, the other waits until the specified time is reached; they both write to a single MVar to signal completion.
This is a little tricky, but mostly due to the details: we want to have stdin in unbuffered and non-echoing mode so that a single keypress stops the waiting without printing anything, and then restore the original state afterwards; and we also need to kill both threads after either finishes, so that we, for example, stop reading from stdin once the timeout expires. Additionally, we need to ensure things are cleaned up properly if an exception occurs. bracket simplifies the clean-up logic here, but it's still pretty ugly:
import Prelude hiding (catch)
import Control.Exception
import Control.Concurrent
import System.IO
withRawStdin :: IO a -> IO a
withRawStdin = bracket uncook restore . const
where
uncook = do
oldBuffering <- hGetBuffering stdin
oldEcho <- hGetEcho stdin
hSetBuffering stdin NoBuffering
hSetEcho stdin False
return (oldBuffering, oldEcho)
restore (oldBuffering, oldEcho) = do
hSetBuffering stdin oldBuffering
hSetEcho stdin oldEcho
waitFor :: Int -> IO ()
waitFor delay = do
done <- newEmptyMVar
withRawStdin . bracket (start done) cleanUp $ \_ -> takeMVar done
where
start done = do
t1 <- forkIO $ getChar >> putMVar done ()
t2 <- forkIO $ threadDelay delay >> putMVar done ()
return (t1, t2)
cleanUp (t1, t2) = do
killThread t1
killThread t2
Even after all that, this solution still doesn't handle waiting until a specific time — just waiting a certain number of microseconds. For turning a time of day into a number of microseconds to sleep, this previous SO question may help. If the sleeps are sufficiently long, then they might not fit into an Int of microseconds, so you might have to use threadDelay in a loop, or delay from the unbounded-delays package.

Parallel IO Causes Random Text Output in Terminal

I'm using
import Control.Concurrent.ParallelIO.Global
main = parallel_ (map processI [1..(sdNumber runParameters)]) >> stopGlobalPool
where
processI :: Int -> IO ()
is some function, which reads data from file, processes it and writes it to another file. No output to terminal. The problem is when I run the program with +RTS -N8 the terminal is flooded with random text like
piptufuht teata thtsieieo ocnsno e nscsdeoe qnqvuduee ernvnstetiirioasanlil lolwynya. .s
w
a s s uY Ysosopuuue's'nvpvdeeee n dpdp rerdodoub beada
bub lel y
What is happening? Without +RTS there is no clutter. I couldn't reproduce the behavior with a more simple (suitable to post here) program.
GHC 7.0.3 if that matters
Buffering is probably preventing you from constructing a simple test case. I was able to reproduce it with this (only when run with +RTS -Nsomething):
import Control.Concurrent
import System.IO
main :: IO ()
main = do
hSetBuffering stdout NoBuffering
forkIO $ putStrLn "foo"
forkIO $ putStrLn "bar"
forkIO $ putStrLn "baz"
threadDelay 1000 -- Allow things to print
As Thomas mentioned, you'll probably need to sequence this somehow, though I'm not sure how writing straight to files would change this. Here's a simple example how you can sequence this with a Chan. I'm sure there's a better way to do this, this is just an example of how I got this to not garble the output.
import Control.Concurrent
import Control.Concurrent.Chan
import System.IO
main :: IO ()
main = do
hSetBuffering stdout NoBuffering
ch <- newChan -- Things written here are picked up by stuffWriter
forkIO $ stuffWriter ch -- Fire up concurrent stuffWriter
forkIO $ writeChan ch "foo"
forkIO $ writeChan ch "bar"
forkIO $ writeChan ch "baz"
threadDelay 1000 -- Allow things to print
-- | Write all the things!
stuffWriter :: Chan String -> IO ()
stuffWriter ch = do
readChan ch >>= putStrLn -- Block, then write once I've got something
stuffWriter ch -- loop... looking for more things to write
Now your writes to somewhere are now synchronous (stuffWriter writes things, one at a time), and you should have no more garbling.

Resources