My code needs to fire multiple threads and keep track of which have finished and which are still running. I as planning on using waitAny or waitAnyCatch, but was thrown off by the following in the documentation
If multiple Asyncs complete or have completed, then the value returned corresponds to the first completed Async in the list.
If that is really the case, how does one ever keep track of running / exited threads reliably?
Here's my simplified code:
chan <- newChan
currentThreadsRef <- newIORef []
-- read jobs from a channel, and run them in parallel asyncs/threads,
-- while adding all threads references to currentThreadsRef
async $ do
jobArgs <- readChan chan
jobAsync <- async $ runJob jobArgs
atomicallyModifyIORef' currentThreadsRef $ \x -> (jobAsync:x, ())
-- wait for jobs to be finished, and remove the thread refernece
-- from currentThreadsRef
waitForAllJobs currentJobsRef = do
(readIORef currentJobsRef) >>= \case
[] -> logDebug "All jobs exited"
currentJobs -> do
(exitedJob, jobResult) <- waitAnyCatch currentJobs
atomicallyModifyIORef currentJobsRef $ \x -> (filter (/= exitedjob) x, ())
logDebug $ "Job completed with result=" <> show result
waitForAllJobs currentJobsRef
PS: Although it may not be obvious from my simplified code above, there is a reason why I cannot simply use mapConcurrently over the input-data. Actually, async-pool seems like a good fit for my use-case, but even that has the same problem with waitAny.
Here's a program that launches 1000 asyncs all set to terminate within a second and waits for them all in a loop. Compiled with ghc -O2 -threaded and run with +RTS -N, it runs in about 1.5 seconds, and none of the asyncs gets "lost":
import Control.Concurrent
import Control.Concurrent.Async
import qualified Data.Set as Set
main :: IO ()
main = do
let n = 1000 :: Int
asyncs0 <- mapM (\i -> async (threadDelay 1000000 >> return i)) [1..n]
let loop :: Set.Set (Async Int) -> IO ()
loop asyncs | null asyncs = return ()
| otherwise = do
(a, _i) <- waitAny (Set.toList asyncs)
loop (Set.delete a asyncs)
loop (Set.fromList asyncs0)
So, as was mentioned in a comment, the documentation is referring to the fact that the first completed async in the provided list is the one that will be "returned", but if multiple asyncs have completed, the additional ones aren't "forgotten". You just need to remove the returned async from the list and re-poll, and you'll eventually get them all.
So, you shouldn't have any trouble waiting on multiple asyncs with waitAny.
consider the following simple IO function:
req :: IO [Integer]
req = do
print "x"
return [1,2,3]
In reality this might be a http request, which returns a list after parsing it's result.
I'm trying to concatenate the results of several calls of that function in a lazy way.
In simple terms, the following should print the 'x' only two times:
fmap (take 4) req'
--> [1, 2, 3, 4]
I thought this might be solved with sequence or mapM, however my approach fails in terms of laziness:
import Control.Monad
req' :: IO [Integer]
req' = fmap concat $ mapM req [1..1000] -- should be infinite..
This yields the right result, however the IO function req is called 1000 times instead of the necessary 2 times. When implementing the above with a map over an infinite list, the evaluation does not terminate at all.
Short version:
You shouldn't do this, look into a streaming IO library such as pipes or conduit instead.
Long version:
You can't. Or at least, you really shouldn't. Allowing lazily evaluated code to have side effects is generally a very bad idea. Not only does it very quickly become hard to reason about wich effects are performed when and how many times, but even worse, effects may not be performed in the order you expect them to! With pure code, this is not a big deal. With side-effecting code, this is a disaster.
Imagine that you want to read a value from a reference and then replace the value with an updated value. In the IO monad, where the order of computation is well defined, this is easy:
main = do
yesterdaysDate <- readIORef ref
writeIORef ref todaysDate
However, if the above code were instead to be evaluated lazily, there would be no guarantee that the reference was read before it was written - or even that both computations would be executed at all. The semantics of the program would depend entirely on if and when we needed the results of the computations. This is one of the reasons for coming up with monads in the first place: to give programmers a way to write code with side effects, which execute in a well-defined and easily understood order.
Now, it is actually possible to lazily concatenate the lists, if you create them using unsafeInterleaveIO:
import System.IO.Unsafe
req :: IO [Integer]
req = unsafeInterleaveIO $ do
print "x"
return [1,2,3]
req' :: IO [Integer]
req' = fmap concat $ mapM (const req) [1..1000]
This will cause each application of req to be deferred until the corresponding sublist is needed. However, lazily performing IO like this may lead to interesting race conditions and resource leaks, and is generally frowned upon. The recommended alternative would be to use a streaming IO library such as conduit or pipes, which are mentioned in the comments.
Here is how you would do something like this with the streaming and pipes libraries. Pipes programs will be somewhat similar those written with conduit especially in this sort of case. conduit uses different names, and pipes and conduit have somewhat fancier types and operators than streaming; but it's really a matter of indifference which you use. streaming is I think fundamentally simpler in this sort of case; the formulation will be structurally similar to the corresponding IO [a] program and indeed frequently simpler. The essential point is that a Stream (Of Integer) IO () is exactly like list of Integers but it is built in that the elements of the list or stream can arise from successive IO actions.
I gave req an argument in the following, since that seemed to be what you had in mind.
import Streaming
import qualified Streaming.Prelude as S
import Streaming.Prelude (for, each)
req :: Integer -> Stream (Of Integer) IO ()
req x = do -- this 'stream' is just a list of Integers arising in IO
liftIO $ putStr "Sending request #" >> print x
each [x..x+2]
req' :: Stream (Of Integer) IO ()
req' = for (S.each [1..]) req -- An infinite succession of requests
-- each yielding three numbers. Here we are not
-- actually using IO to get each but we could.
main = S.print $ S.take 4 req'
-- >>> main
-- Sending request #1
-- 1
-- 2
-- 3
-- Sending request #2
-- 2
To get our four desired values we had to send two "requests"; we of course don't end up applying req to all Integers! S.take doesn't permit any further development of the infinite stream req' it takes as argument; so only the first element from the second request is ever calculated. Then everything shuts down. The fancy signature Stream (Of Int) IO () could be replaced by a synonymn
type List a = Stream (Of a) IO ()
and you would barely notice the difference from Haskell lists, except you don't get the apocalypses you noticed. The extra moveable parts in the actual signature are distracting here, but make it possible to replicate the whole API of Data.List in basically every detail while permitting IO and avoidance of accumulation everywhere. (Without the further moveable parts it is e.g. impossible to write splitAt, partition and chunksOf, and indeed you will find stack overflow is awash with questions how to do these obvious things with e.g. conduit.)
The pipes equivalent is this
import Pipes
import qualified Pipes.Prelude as P
req :: Integer -> Producer Integer IO ()
req x = do
liftIO $ putStr "Sending request #" >> print x
each [x..x+2]
req' = for (each [1..]) req
main = runEffect $ req' >-> P.take 4 >-> P.print
-- >>> main
-- Sending request #1
-- 1
-- 2
-- 3
-- Sending request #2
-- 2
it differs by treating take and print as pipes, rather than as ordinary functions on streams as they are with Data.List. This has charm but is not needed in the present context where the conception of the stream as an effectful list predominates. Intuitively takeing and printing are things we do to a list, even if it is an effectful list as in this case, and the piping and conduiting aspect is a distraction (in bread-and-butter cases it also nearly doubles the time needed for calculation, due to the cost of >-> and .| which is akin to that of say map.)
It might help understanding if we note that req above could have been written
req x = do
liftIO $ putStr "Sending request #" >> print x
yield x -- yield a >> yield b == each [a,b]
yield (x+1)
yield (x+2)
this will be word for word the same in streaming pipes and conduit. yield a >> rest is the same as a:rest The difference is that a yield a line (in a do block) can be preceded by a bit of IO, e.g. a <- liftIO readLn; yield a
In general list mapM replicateM traverse and sequence should be avoided - except for short lists - for the reasons you mention. sequence is at the bottom of them all and it basically has to constitute the whole list before it can proceed. (Note sequence = mapM id; mapM f = sequence . map f) Thus we see
>>> sequence [getChar,getChar,getChar] >>= mapM_ print
abc'a' -- here and below I just type abc, ghci prints 'a' 'b' 'c'
'b'
'c'
but with a streaming library we see stuff like
>>> S.mapM_ print $ S.sequence $ S.each [getChar,getChar,getChar]
a'a'
b'b'
c'c'
Similarly
>>> replicateM 3 getChar >>= mapM_ print
abc'a'
'b'
'c'
is a mess - nothing happens till the whole list is constructed, then each of the collected Chars is printed in succession. But with a streaming library we write the simpler
>>> S.mapM_ print $ S.replicateM 3 getChar
a'a'
b'b'
c'c'
and the outputs are in sync with the inputs. In particular, no more than one character is in memory at a time. replicateM_, mapM_ and sequence_ by contrast don't accumulate lists aren't a problem. It's the others that should prompt one to think of a streaming library, any streaming library. A monad-general sequence can't do any better than this, as you can see by reflecting on
>>> sequence [Just 1, Just 2, Just 3]
Just [1,2,3]
>>> sequence [Just 1, Just 2, Nothing]
Nothing
If the list were a million Maybe Ints long, it would all have to be remembered and left unused while waiting to see if last item is Nothing. Since sequence, mapM, replicateM, traverse and company are monad general, what goes for Maybe goes for IO.
Continuing above, we can similarly collect the list as you seemed to want to do:
main = S.toList_ (S.take 4 req') >>= print
-- >>> main
-- Sending request #1
-- Sending request #2
-- [1,2,3,2]
or, in the pipes version:
main = P.toListM (req' >-> P.take 4) >>= print
-- >>> main
-- Sending request #1
-- Sending request #2
-- [1,2,3,2]
Or to pile on possibilities, we can do IO with each element, while collecting them in a list or vector or whatever
main = do
ls <- S.toList_ $ S.print $ S.copy $ S.take 4 req'
print ls
-- >>> main
-- Sending request #1
-- 1
-- 2
-- 3
-- Sending request #2
-- 2
-- [1,2,3,2]
Here I print the copies and save the 'originals' for a list. The games we are playing here start to come upon the limits of pipes and conduit, though this particular program can be replicated with them.
As far as I know, what you're looking for shouldn't/can't be done using mapM and should probably use some form of streaming. In case it's helpful, an example using io-streams:
import qualified System.IO.Streams as Streams
import qualified System.IO.Streams.Combinators as Streams
req :: IO (Maybe [Integer])
req = do
print "x"
return (Just [1,2,3])
req' :: IO [Integer]
req' = Streams.toList =<< Streams.take 4 =<< Streams.concatLists =<< Streams.makeInputStream req
The working version of your code:
module Foo where
req :: Integer -> IO [Integer]
req _x = do
print "x"
return [1,2,3]
req' :: IO [Integer]
req' = concat <$> mapM req [1..1000]
(Note: I replaced fmap concat with concat <$>.)
When you evalute fmap (take 4) req', the mapM expression's value is needed, which, in turn, needs the value of the [1..1000] list. So, a 1000 element list is generated and mapM applies the req function to each element -- hence, the 1000 'x'-es printed. concat then has to supply a value to the (take 4) section, which produces [1,2,3] repeated 1000 times. Then, and only then, can (take 4) take the first four elements.
All of these computations occur because a value is needed by ghci, if you're at the interpreter's REPL prompt. Otherwise, in an executing program, take 4 is simply stacked in a waiting thunk until its value is actually needed.
Best to think about this as a tree where expressions are pushed onto the root of the tree, replacing the root each time (root becomes a leaf in another expression that needs its value.) When the value at the root of the tree is needed, evaluate from the bottom up.
Now, if you really only wanted req evaluated once and only once because it is truly a constant value, here's the code:
module Foo where
req2 :: IO [Integer]
req2 = do
print "x"
return [1,2,3]
req2' :: IO [Integer]
req2' = concat <$> mapM (const req2) ([1..1000] :: [Integer])
req2' is evaluated only once because it evaluates to a constant (no function parameters guarantees this.) Admittedly, though, that's probably not what you really intended.
This is what the pipes and conduit ecosystems were designed for. Here's an example for pipes.
#!/usr/bin/env stack
--stack runghc --resolver=lts-7.16 --package pipes
module Main where
import Control.Monad (forever)
import Pipes as P
import qualified Pipes.Prelude as P
req :: Producer Int IO ()
req = forever $ do
liftIO $ putStrLn "Making a request."
mapM_ yield [1,2,3]
main :: IO ()
main = P.toListM (req >-> P.take 4) >>= print
Note that normally you don't collapse a result into a list using pipes, but that seems to be your use case.
The following seems to work (as in: it keeps saying Surely tomorrow every second)
import Control.Concurrent
import Control.Concurrent.MVar
import Control.Exception (evaluate)
main :: IO ()
main = do
godot <- newEmptyMVar
forkIO $ do
g <- evaluate $ last [0..]
putMVar godot g
let loop = do
threadDelay $ 10^6
g <- tryTakeMVar godot
case g of
Just g -> return ()
Nothing -> putStrLn "Surely tomorrow." >> loop
loop
This uses evaluate to ensure last [0..] is actually forced to WHFN before filling the MVar – if I change the forked thread to
forkIO $ do
let g = last [0..]
putMVar godot g
then the program terminates.
However, evaluate uses seq. In the context of deterministic parallelism, it's always emphasized that seq is not sufficient to actually guarantee evaluation order. Does this problem not arise in a monadic context, or should I better use
forkIO $ do
let g = last [0..]
g `pseq` putMVar godot g
to ensure the compiler can't reorder the evaluation so tryTakeMVar succeeds prematurely?
The point of pseq is to ensure that after the parent thread sparks a computation with par, it does not immediately proceed to try to evaluate the result of the sparked computation itself, but instead does its own job first. See the documentation for an example. When you're working more explicitly with concurrency, you shouldn't need pseq.
If I'm not totally wrong, evaluating last [0..] to WHNF would take an infinite amount of time, because WHNF for an Int means that you know the exact number.
putMVar will not start executing before last [0..] is evaluated to WHNF (which as we know takes forever), because putMVar will need the RealWorld-token (s) returned by the call to evaluate. (Or to put it more simply: evaluate works. It finishes only after evaluating its argument to WHNF.)
evaluate :: a -> IO a
evaluate a = IO $ \s -> seq# a s
-- this ^
putMVar (MVar mvar#) x = IO $ \ s# ->
-- which is used here ^^
case putMVar# mvar# x s# of
-- is needed here ^^
s2# -> (# s2#, () #)
where seq# is a GHC-prim function that guarantees to return (# a, s #) only after evaluating a to WHNF (that's its purpose). That is, only after a is evaluated to WHNF, s can be used in the call to putMVar. Although these tokens are purely imaginative ("RealWorld is deeply magical..."), they are respected by the compiler, and the whole IO-monad is built on top of it.
So yes, evaluate is enough in this case. evaluate is more than seq: it combines IO-monadic sequencing with seq#-sequencing to produce its effect.
In fact, the pseq version looks a bit fishy to me, because it ultimately depends on lazy, where evaluate ultimately depends on seq# and monadic token-passing. And I trust seq# a bit more.
You have a sequence of actions that prefer to be executed in chunks due to some high-fixed overhead like packet headers or making connections. The limit is that sometimes the next action depends on the result of a previous one in which case, all pending actions are executed at once.
Example:
mySession :: Session IO ()
a <- readit -- nothing happens yet
b <- readit -- nothing happens yet
c <- readit -- nothing happens yet
if a -- all three readits execute because we need a
then write "a"
else write "..."
if b || c -- b and c already available
...
This reminds me of so many Haskell concepts but I can't put my finger on it.
Of course, you could do something obvious like:
[a,b,c] <- batch([readit, readit, readit])
But I'd like to hide the fact of chunking from the user for slickness purposes.
Not sure if Session is the right word. Maybe you can suggest a better one? (Packet, Batch, Chunk and Deferred come to mind.)
Update
I think there was a really good answer last night that I read on my phone but when I came back to look for it today it was gone. Was I dreaming?
I don't think you can do exactly what you want, since what you describe exploits haskell's lazy evaluation to have the evaluation of a force the actions that compute b and c, and there's no way to seq on unspecified values.
What I could do was hack together a monad transformer that delayed actions sequenced via >> so that they could be executed all together:
data Session m a = Session { pending :: [ m () ], final :: m a }
runSession :: Monad m => Session m a -> m a
runSession (Session ms ma) = foldr (flip (>>)) (return ()) ms >> ma
instance Monad m => Monad (Session m) where
return = Session [] . return
s >>= f = Session [] $ runSession s >>= (runSession . f)
(Session ms ma) >> (Session ms' ma') =
Session (ms' ++ (ma >> return ()) : ms) ma'
This violates some monad laws, but lets you do something like:
liftIO :: IO a -> Session IO a
liftIO = Session []
exampleSession :: Session IO Int
exampleSession = do
liftIO $ putStrLn "one"
liftIO $ putStrLn "two"
liftIO $ putStrLn "three"
liftIO $ putStrLn "four"
trace "five" $ return 5
and get
ghci> runSession exampleSession
five
one
two
three
four
5
ghci> length (pending exampleSession)
4
This is very similar to what Haxl does.
For more info:
Open sourcing haxl - Facebook Code Blog
ICFP 2014 talk
You could use the unsafeInterleaveIO function. It is a dangerous function that can introduce bugs to your program if not used carefully, but it does what you're asking for.
You can insert it into your example code like this:
lazyReadits :: IO [a]
lazyReadits = unsafeInterleaveIO $ do
a <- readit
r <- lazyReadits
return (a:r)
unsafeInterleaveIO makes the action as a whole lazy, but once it starts evaluating it will evaluate as if it had been strict. This means in my above example: readit will run as soon as something tests whether the returned list is empty or not. If I'd used mapM unsafeInterleaveIO (replicate 3 readit) instead, then readit would only be run when the actual elements of the list are evaluated, which would make the contents of the list depend on the order in which its elements are inspected, which is one example of how unsafeInterleaveIO can introduce bugs.
The following is my dining philosophers code and yields a compilation error saying "The last statement in a 'do' construct must be an expression: mVar2 <- newEmptyMVar mVar3"
Can Somebody help me fix this error and get this program working? thank you
import Control.Concurrent
import Control.Concurrent.MVar
import System.Random
takefork :: Int -> forks -> IO ()
takefork n forks = takeMVar (forks!!n)
releasefork :: Int -> forks -> IO ()
releasefork n forks = putMVar (forks!!n)
philosopher :: [Int]
philosopher = [1,2,3,4,5]
forks :: [MVar] -> [Int]
forks = do
takefork n ( philosopher - 1)
threadDelay delay
let delay = 100000
takefork n philosopher
putStrLn("Philosopher" ++ philosopher ++ "has started eating")
releasefork n philosopher
releasefork n ( philosopher - 1)
ptStrLn ("Philosopher" ++ philosopher ++ "has stopped eating")
forks
main :: IO ()
main = do
mVar1 <- newEmptyMVar
mVar2 <- newEmptyMVar
mVar3 <- newEmptyMVar
mVar4 <- newEmptyMVar
mVar5 <- newEmptyMVar
let mVar = [mVar1, mVar2, mVar3, mVar4, mVar5]
sequence_ [ forkIO forks (mVar philosopher) ]
There are many problems with your code.
The error message you report indicates you are probably mixing spaces and tabs. Get rid of the tabs and use only spaces.
You are presumably writing this program in order to practice writing Haskell programs, not in order to run the program for fun and profit. So we don't want to simply give you a working Dining Philosophers implementation, we want to help you write your implementation.
I cannot tell from your code how you expect it to work.
I'm going to focus on the last line:
sequence_ [ forkIO forks (mVar philosopher) ]
sequence_ :: [IO a] -> IO () --- give sequence_ a list of i/o actions, and it (returns an i/o action that) performs each action in order. From the [...], it looks like you are trying to give it a list, but with only one element. This is probably not what you mean.
forkIO :: IO () -> IO ThreadID --- give forkIO an i/o action, and it (returns an i/o action that) starts that i/o action running in a new thread, giving you the id of that thread.
There are two problems here:
forks is a function, not an i/o action (it's not even a function that returns an i/o action, though you probably mean it to be)
you give forkIO a second argunment ((mVar philosopher)), but it only takes one argument
mVar philosopher itself doesn't make any sense: mVar :: [MVar a] (it's a list of MVars, and I haven't worked out what type the MVars are supposed to contain) but you treat it like a function, passing it philosopher as an argument.
At this point a lightbulb blinks on above my head. You wish to call forks with parameters mVar and philosopher?
sequence_ [ forkIO (forks mVar philosopher) ]
We're still sequencing a single action though. Perhaps you wish to call forks with each element of philosopher in turn?
sequence_ $ map (\n -> forkIO (forks mVar n)) philosopher
We can simplify this to
mapM_ (\n -> forkIO (forks mVar n)) philosopher
This doesn't match up with the type you given forks :: [MVar] -> [Int]. But that's probably wrong, so you'll want to fix that function next.