I'm writing a simple script to run bunch of tasks in parallel using the Shelly library but I want to limit the max number of tasks running at any one time. The script takes a file with an input on each line and runs a task for that input. There are a few hundred inputs in the file and I want to limit to around 16 processes at a time.
The current script actually limits to 1 (well tries to) using a QSem with an initial count of 1. I seem to be missing something though because when I run on a test file with 4 inputs I see this:
Starting
Starting
Starting
Starting
Done
Done
Done
Done
So the threads are not blocking on the QSem as I would expect, they're all running simultaneously. I've even gone so far as to implement my own semaphores both on MVar and TVar and neither worked the way I expected. I'm obviously missing something fundamental but what? I've also tried compiling the code and running it as a binary.
#!/usr/bin/env runhaskell
{-# LANGUAGE TemplateHaskell, QuasiQuotes, DeriveDataTypeable, OverloadedStrings #-}
import Shelly
import Prelude hiding (FilePath)
import Text.Shakespeare.Text (lt)
import qualified Data.Text.Lazy as LT
import Control.Monad (forM)
import System.Environment (getArgs)
import qualified Control.Concurrent.QSem as QSem
import Control.Concurrent (forkIO, MVar, putMVar, newEmptyMVar, takeMVar)
-- Define max number of simultaneous processes
maxProcesses :: IO QSem.QSem
maxProcesses = QSem.newQSem 1
bkGrnd :: ShIO a -> ShIO (MVar a)
bkGrnd proc = do
mvar <- liftIO newEmptyMVar
_ <- liftIO $ forkIO $ do
-- Block until there are free processes
sem <- maxProcesses
QSem.waitQSem sem
putStrLn "Starting"
-- Run the shell command
result <- shelly $ silently proc
liftIO $ putMVar mvar result
putStrLn "Done"
-- Signal that this process is done and another can run.
QSem.signalQSem sem
return mvar
main :: IO ()
main = shelly $ silently $ do
[img, file] <- liftIO $ getArgs
contents <- readfile $ fromText $ LT.pack file
-- Run a backgrounded process for each line of input.
results <- forM (LT.lines contents) $ \line -> bkGrnd $ do
runStdin <command> <arguments>
liftIO $ mapM_ takeMVar results
As I said in my comment, each call to bkGrnd creates its own semaphonre, allowing every thread to continue without waiting. I would try something like this instead, where the semaphore is created in the main and passed each time to bkGrnd.
bkGrnd :: QSem.QSem -> ShIO a -> ShIO (MVar a)
bkGrnd sem proc = do
mvar <- liftIO newEmptyMVar
_ <- liftIO $ forkIO $ do
-- Block until there are free processes
QSem.waitQSem sem
--
-- code continues as before
--
main :: IO ()
main = shelly $ silently $ do
[img, file] <- liftIO $ getArgs
contents <- readfile $ fromText $ LT.pack file
sem <- maxProcesses
-- Run a backgrounded process for each line of input.
results <- forM (LT.lines contents) $ \line -> bkGrnd sem $ do
runStdin <command> <arguments>
liftIO $ mapM_ takeMVar results
You have an answer, but I need to add: QSem and QSemN are not thread safe if killThread or asynchronous thread death is possible.
My bug report and patch are GHC trac ticket #3160. The fixed code is available as a new library called SafeSemaphore with module Control.Concurrent.MSem, MSemN, MSampleVar, and a bonus FairRWLock.
Isn't it better
bkGrnd sem proc = do
QSem.waitQSem sem
mvar <- liftIO newEmptyMVar
_ <- liftIO $ forkIO $ do
...
so not even forkIO until you get the semaphore?
Related
I want to make an "asynchronous" reading of stdin for a few seconds. The way I am doing if is by forking getContents and writing to a Chan. After 5 seconds, I kill the thread and read the channel.
From my understading, the code below should just print whatever is in chan and terminate, but it keeps waiting for input and ^C must be pressed to finish. This is the behaviour you have when doing getContents >>= print on its own, so I have two guesses and no clue about each:
The thread is not killed and getContent keep running asking for more input
Something weird is happening with multithreading (see considerations below)
-- OS: Ubuntu 22.04.1 LTS
-- Gnome Terminal
-- ghc 9.2.5
import Control.Concurrent.Chan ( newChan, readChan, writeChan, Chan )
import Control.Concurrent (threadDelay, forkIO, killThread)
import System.IO (getContents, BufferMode (..), stdin, hSetBuffering)
main :: IO ()
main = do
hSetBuffering stdin NoBuffering
chan <- newChan
putStrLn "start"
threadId <- forkIO $ getContents >>= writeChan chan
threadDelay 5000000
putStrLn "\nend"
killThread threadId
a <- readChan chan
print a
Some considerations:
Using getLine make it work only if Enter is pressed. But I'd like to just "stream" stdin into the channel
hSetBuffering stdin NoBuffering is neccessary, otherwise the programm hangs (I guess waiting for end of input?)
Using getContents' cause a thread blocked indefinitely in an MVar operation, which up to the documentation is due to the channel being empty. I guess getContents' actually never terminates.
Last but most importantly, the behaviour is different depending on the compilation parameters:
ghc -threaded main.hs && ./main +RTS -N2 wont print anything and will hang until ^C is pressed (same thing with -N1, -N3, etc...)
runghc main.hs will actually print whatever has being the stdin during the 5 seconds (i.e. whatever is in chan) and then hang.
Just to clarify. Here are the ouputs:
> runghc main.hs
start
abc # user input
end
"abc^C" # ^C is done manually, to termiante the program and the last \" is for formatting purpose
> ghc -threaded main.hs && ./main +RTS -N2
start
abc # user input
end
^C # ^C is done manually, to termiante the program
So the question is simple. How do I make my program to end?
The thread getContents >>= writeChan chan is not an infinite loop that constantly adds content to chan. getContents creates a thunk, which is put in chan, and the thread terminates near instantaneously. Then in the main thread readChan gets that thunk, and print a forces it. It's the forcing of the thunk which prompts reading stdin, hence your program just blocks for more input until EOF or it gets killed.
What you want to do is to explicitly take small bits of input and write them into the channel. However, in the main thread, the channel does not give you a way to tell when it's ended. A workaround is to use an IORef String instead as a channel. Write to it by explicitly appending to the stored string, and readIORef will give you whatever content was written so far.
import Control.Concurrent.Chan ( newChan, readChan, writeChan, Chan )
import Control.Concurrent (threadDelay, forkIO, killThread)
import Control.Monad (forever)
import Data.IORef
import System.IO (getContents, BufferMode (..), stdin, hSetBuffering)
main :: IO ()
main = do
hSetBuffering stdin NoBuffering
buf <- newIORef []
putStrLn "start"
threadId <- forkIO $ forever $ do
c <- getChar
atomicModifyIORef' buf (\cs -> (c : cs, ()))
threadDelay 5000000
putStrLn "\nend"
killThread threadId
a <- reverse <$> readIORef buf
print a
I want a small operation where one thread adds to a shared state some value, while another thread takes the value out and prints it. Here are two versions using TMVar and MVar respectively. The TMVar version is not working somehow, it keeps printing out the first value. What is the problem in the STM first version? How to fix the first TMVar version to make it work?
import Control.Concurrent (forkIO, takeMVar,newEmptyMVar,putMVar)
import Control.Monad (forM_, replicateM_)
import Control.Concurrent.STM (atomically, readTMVar, putTMVar, newEmptyTMVarIO)
n=10
main = do
mvar<- newEmptyTMVarIO
forkIO $ do
forM_ [1..n] $ \x-> atomically $ do
putTMVar mvar $! x
replicateM_ n $ do
a<- atomically $ readTMVar mvar
print $ show a
main2 = do
mvar<- newEmptyMVar
forkIO $ do
mapM_ (\x-> putMVar mvar x) [1..n]
replicateM_ n $ do
a<- takeMVar mvar
print $ show a
You're using readTMVar, which just looks at what's in the TMVar. I imagine you mean to use takeTMVar to give the other thread a chance to put something new in it.
I'm trying to use hinotify and STM to make a simple concept:
Block the thread of execution until the watched file stops being modified
Continue once modifications stop, or their interval is larger than some time threshold (debounces)
Currently, I'm trying to use a TSem to get this working correctly, but I keep running into either of these problems:
the thread doesn't block at all, and I end up removing the hinotify watcher before it even starts, throwing an exception
the thread blocks indefinitely, causing STM to throw an exception
the program prints 3 times (3 concurrent notifications), but only lasts for 1 second and not 10
The code I've written is below, and can be checked out on github to see for yourself.
module Main where
import System.INotify
import System.Environment (getArgs)
import Control.Concurrent (forkIO, threadDelay)
import Control.Concurrent.STM
import Control.Concurrent.STM.TSem
import Control.Concurrent.STM.TVar
import Control.Monad (forM_)
main :: IO ()
main = do
[file] <- getArgs
-- make changes every 1/10th of a second for 10 seconds
forkIO $ forM_ [0..100] $ \s -> do
appendFile file $ show s
threadDelay (second `div` 10)
debouncer <- atomically $ newTSem 0
notif <- initINotify
expectation <- newTVarIO (0 :: Int)
watcher <- addWatch notif [Modify] file $ \e -> do
e' <- atomically $ do
modifyTVar expectation (+1)
readTVar expectation
print e
threadDelay second
e'' <- readTVarIO expectation
if e' == e''
then atomically $ signalTSem debouncer
else pure ()
atomically $ waitTSem debouncer
removeWatch watcher
killINotify notif
second = 1000000
Do you see anything immediately wrong with what I'm trying to do?
Does it have to be STM? You can achieve you goal with ordinary MVars:
#!/usr/bin/env stack
{- stack
--resolver lts-7.9
--install-ghc runghc
--package hinotify
--package stm
-}
import System.INotify
import System.Environment (getArgs)
import Control.Concurrent (forkIO, threadDelay)
import Control.Concurrent.MVar (newMVar, newEmptyMVar, readMVar, swapMVar, putMVar, takeMVar, modifyMVar_)
import Control.Monad (forM_, forever)
main :: IO ()
main = do
[file] <- getArgs
mainBlocker <- newEmptyMVar
tickCounter <- newMVar 0
-- make changes every 1/10th of a second for 10 seconds
forkIO $ forM_ [0..100] $ \s -> do
appendFile file $ show s
threadDelay (second `div` 10)
-- set up file watches
notif <- initINotify
watcher <- addWatch notif [Modify] file $ \e -> do
swapMVar tickCounter 10
print "file has been modified; reset ticks to 10"
-- 'decreaser' thread
forkIO $ forever $ do
threadDelay second
ticks <- readMVar tickCounter
print $ "current ticks in decreaser thread: " ++ show ticks
if ticks <= 0
then putMVar mainBlocker ()
else modifyMVar_ tickCounter (\v -> return (v-1))
takeMVar mainBlocker
print "exiting..."
removeWatch watcher
killINotify notif
second = 1000000
The idea is a 'tick' counter that gets set to 10 whenever the file has been modified. A separate thread tries to count down to 0 and, when it succeeds, releases the block of the main thread.
If you use stack you can execute the code as a script like this:
stack theCode.hs fileToBeWatched
I want to create a process and write some text from my haskell program into the process's stdin periodically (from an IO action).
The following works correctly in GHCi but don't work correctly when built and run. In GHCi everything works perfectly and the value from the IO action is fed in periodically. When built and run however, it seems to pause for arbitrarily long periods of time when writing to stdin of the process.
I've used CreateProcess (from System.Process) to create the handle and tried hPutStrLn (bufferent set to NoBuffering -- LineBuffering didnt work either).
So I'm trying the process-streaming package and pipes but can't seem to get anything to work at all.
The real question is this: How do i create a process from haskell and write to it periodically?
Minimal example that exhibits this behavior:
import System.Process
import Data.IORef
import qualified Data.Text as T -- from the text package
import qualified Data.Text.IO as TIO
import Control.Concurrent.Timer -- from the timers package
import Control.Concurrent.Suspend -- from the suspend package
main = do
(Just hin, _,_,_) <- createProcess_ "bgProcess" $
(System.Process.proc "grep" ["10"]) { std_in = CreatePipe }
ref <- newIORef 0 :: IO (IORef Int)
flip repeatedTimer (msDelay 1000) $ do
x <- atomicModifyIORef' ref $ \x -> (x + 1, x)
hSetBuffering hin NoBuffering
TIO.hPutStrLn hin $ T.pack $ show x
Any help will be greatly appreciated.
This is a pipes Producer that emits a sequence of numbers with a second delay:
{-# language NumDecimals #-}
import Control.Concurrent
import Pipes
import qualified Data.ByteString.Char8 as Bytes
periodic :: Producer Bytes.ByteString IO ()
periodic = go 0
where
go n = do
d <- liftIO (pure (Bytes.pack (show n ++ "\n"))) -- put your IO action here
Pipes.yield d
liftIO (threadDelay 1e6)
go (succ n)
And, using process-streaming, we can feed the producer to an external process like this:
import System.Process.Streaming
main :: IO ()
main = do
executeInteractive (shell "grep 10"){ std_in = CreatePipe } (feedProducer periodic)
I used executeInteractive, which sets std_in automatically to NoBuffering.
Also, if you pipe std_out and want to process each match immediately, be sure to pass the --line-buffered option to grep (or use the stdbuf command) to ensure that matches are immediately available at the output.
What about using threadDelay, e.g.:
import Control.Monad (forever)
import Control.Concurrent (threadDelay)
...
forever $ do
x <- atomicModifyIORef' ref $ \x -> (x + 1, x)
hSetBuffering hin NoBuffering
TIO.hPutStrLn hin $ T.pack $ show x
threadDelay 1000000 -- 1 sec
Spawn this off in another thread if you need to do other work at the same time.
You can remove he need for the IORef with:
loop h x = do
hSetBuffering h NoBuffering
TIO.hPutStrLn h $ T.pack $ show x
threadDelay 1000000
loop h (x+1)
And, of course, you only need to do the hSetBuffering once - e.g. do it just before you enter the loop.
I'm using
import Control.Concurrent.ParallelIO.Global
main = parallel_ (map processI [1..(sdNumber runParameters)]) >> stopGlobalPool
where
processI :: Int -> IO ()
is some function, which reads data from file, processes it and writes it to another file. No output to terminal. The problem is when I run the program with +RTS -N8 the terminal is flooded with random text like
piptufuht teata thtsieieo ocnsno e nscsdeoe qnqvuduee ernvnstetiirioasanlil lolwynya. .s
w
a s s uY Ysosopuuue's'nvpvdeeee n dpdp rerdodoub beada
bub lel y
What is happening? Without +RTS there is no clutter. I couldn't reproduce the behavior with a more simple (suitable to post here) program.
GHC 7.0.3 if that matters
Buffering is probably preventing you from constructing a simple test case. I was able to reproduce it with this (only when run with +RTS -Nsomething):
import Control.Concurrent
import System.IO
main :: IO ()
main = do
hSetBuffering stdout NoBuffering
forkIO $ putStrLn "foo"
forkIO $ putStrLn "bar"
forkIO $ putStrLn "baz"
threadDelay 1000 -- Allow things to print
As Thomas mentioned, you'll probably need to sequence this somehow, though I'm not sure how writing straight to files would change this. Here's a simple example how you can sequence this with a Chan. I'm sure there's a better way to do this, this is just an example of how I got this to not garble the output.
import Control.Concurrent
import Control.Concurrent.Chan
import System.IO
main :: IO ()
main = do
hSetBuffering stdout NoBuffering
ch <- newChan -- Things written here are picked up by stuffWriter
forkIO $ stuffWriter ch -- Fire up concurrent stuffWriter
forkIO $ writeChan ch "foo"
forkIO $ writeChan ch "bar"
forkIO $ writeChan ch "baz"
threadDelay 1000 -- Allow things to print
-- | Write all the things!
stuffWriter :: Chan String -> IO ()
stuffWriter ch = do
readChan ch >>= putStrLn -- Block, then write once I've got something
stuffWriter ch -- loop... looking for more things to write
Now your writes to somewhere are now synchronous (stuffWriter writes things, one at a time), and you should have no more garbling.